{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "本文是上一个策略“机器学习-动态因子选择策略”的详细分析讲解篇\n",
    "\n",
    "欢迎访问我的Github: https://github.com/charliedream1/ai_quant_trade\n",
    "\n",
    "策略说明：\n",
    "问题：如果回测某个区间最大回撤很大，说明这个时间点选取的因子可能不合适，如何自动判断因子重要性，并选择？\n",
    "\n",
    "因子选择：\n",
    "- 基本面因子：https://www.joinquant.com/help/api/help#name:Stock\n",
    "- 技术分析指标因子：https://www.joinquant.com/help/api/help#name:technicalanalysis\n",
    "\n",
    "策略思路\n",
    "1. 因子筛选：通过基本面和技术面人工选择需要使用的因子\n",
    "2. 训练决策树：对长周期收益增加的打标签1，否则0。对收益进行分类。之后，按照因子的重要性，选择top的因子\n",
    "3. 训练回归支持向量机：使用挑选的重要因子训练。真实市值和模型预测的市值差，找到预测和真实值差值最小的选择购买\n",
    "\n",
    "实验结论：\n",
    "1. 特征选择：(a)随机森林比决策树性能好 (b)特征选择过多效果并不如挑选几个重要的特征, 测试使用10个特征性能 > 20个  > 大于使用所有 (c) 使用随机森林对特征重要性选择，n_estimators越大，回归模型性能越好 (d)\n",
    "2. 市值回归预测：(a)随机森林 > 线性核的SVR (b) rbf核和DNN的score为负数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from jqlib.technical_analysis import *\n",
    "from jqdata import *\n",
    "\n",
    "# 2. 导入其它库\n",
    "import datetime\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "from sklearn.svm import SVR\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.tree import DecisionTreeClassifier\n",
    "from sklearn.preprocessing import StandardScaler"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. 基本数据情况分析"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(datetime.date(2022, 6, 28),\n",
       " datetime.date(2022, 6, 27),\n",
       " datetime.date(2022, 6, 23))"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "today = datetime.date.today()\n",
    "yesterday = today - datetime.timedelta(days = 1)\n",
    "five_days_ago = today - datetime.timedelta(days = 5)\n",
    "today, yesterday, five_days_ago"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2022-06-23    4343.88\n",
       "2022-06-24    4394.77\n",
       "2022-06-27    4444.26\n",
       "2022-06-28    4444.26\n",
       "Name: close, dtype: float64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 使用的股票池，使用沪深300\n",
    "ref_index_stock = '000300.XSHG'\n",
    "# 获取大盘收盘价\n",
    "hs300_close = get_price(ref_index_stock, five_days_ago, today, fq='pre')['close']\n",
    "hs300_close.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.023108373159479667"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 大盘相对5天前涨幅\n",
    "hs300_ret = hs300_close[-1] / hs300_close[0] - 1\n",
    "hs300_ret"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "current_price 4444.26\n",
      "ma_5days 4355.82\n",
      "ma_20days 4231.574500000001\n",
      "ma_125days 4254.344700000001\n"
     ]
    }
   ],
   "source": [
    "close_data = attribute_history(ref_index_stock, 100, '1d', ['close'], df=False)\n",
    "# 设定均线\n",
    "n1 = 5\n",
    "n2 = 20\n",
    "n3 = 125\n",
    "# 取得过去5个交易日的平均收盘价\n",
    "ma_n1 = close_data['close'][-n1:].mean()\n",
    "# 取得过去20个交易日的平均收盘价\n",
    "ma_n2 = close_data['close'][-n2:].mean()\n",
    "# 取得过去125个交易日的平均收盘价\n",
    "ma_n3 = close_data['close'][-n3:].mean()\n",
    "# 取得上一时间点价格\n",
    "current_price = close_data['close'][-1]\n",
    "print('current_price', current_price)\n",
    "print('ma_5days', ma_n1)\n",
    "print('ma_20days', ma_n2)\n",
    "print('ma_125days', ma_n3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['000001.XSHE',\n",
       " '000002.XSHE',\n",
       " '000063.XSHE',\n",
       " '000066.XSHE',\n",
       " '000069.XSHE',\n",
       " '000100.XSHE',\n",
       " '000157.XSHE',\n",
       " '000166.XSHE',\n",
       " '000301.XSHE',\n",
       " '000333.XSHE']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 获取沪深300股票列表\n",
    "hs300_stock_lst = get_index_stocks('000300.XSHG')\n",
    "hs300_stock_lst[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. 数据获取"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  1.1 加载基本面因子数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>code</th>\n",
       "      <th>market_cap</th>\n",
       "      <th>anon_1</th>\n",
       "      <th>anon_2</th>\n",
       "      <th>anon_3</th>\n",
       "      <th>anon_4</th>\n",
       "      <th>anon_5</th>\n",
       "      <th>inc_total_revenue_year_on_year</th>\n",
       "      <th>turnover_ratio</th>\n",
       "      <th>pe_ratio</th>\n",
       "      <th>pb_ratio</th>\n",
       "      <th>ps_ratio</th>\n",
       "      <th>roa</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>000001.XSHE</td>\n",
       "      <td>2811.9175</td>\n",
       "      <td>NaN</td>\n",
       "      <td>-4.061748e+11</td>\n",
       "      <td>11.600355</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.079363</td>\n",
       "      <td>10.57</td>\n",
       "      <td>0.7716</td>\n",
       "      <td>7.2001</td>\n",
       "      <td>0.8363</td>\n",
       "      <td>1.6179</td>\n",
       "      <td>0.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>000002.XSHE</td>\n",
       "      <td>2151.8584</td>\n",
       "      <td>3.048454e+11</td>\n",
       "      <td>-3.933097e+11</td>\n",
       "      <td>6.560801</td>\n",
       "      <td>0.173873</td>\n",
       "      <td>0.121636</td>\n",
       "      <td>0.65</td>\n",
       "      <td>0.7211</td>\n",
       "      <td>9.4960</td>\n",
       "      <td>0.9085</td>\n",
       "      <td>0.4748</td>\n",
       "      <td>0.14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>000063.XSHE</td>\n",
       "      <td>1192.6785</td>\n",
       "      <td>5.743872e+10</td>\n",
       "      <td>-5.531577e+10</td>\n",
       "      <td>2.306207</td>\n",
       "      <td>0.226818</td>\n",
       "      <td>0.299797</td>\n",
       "      <td>6.43</td>\n",
       "      <td>1.0023</td>\n",
       "      <td>17.4166</td>\n",
       "      <td>2.2699</td>\n",
       "      <td>1.0263</td>\n",
       "      <td>1.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>000066.XSHE</td>\n",
       "      <td>344.1928</td>\n",
       "      <td>9.296436e+09</td>\n",
       "      <td>-1.468575e+10</td>\n",
       "      <td>1.436590</td>\n",
       "      <td>0.370459</td>\n",
       "      <td>0.397642</td>\n",
       "      <td>-9.24</td>\n",
       "      <td>0.8867</td>\n",
       "      <td>53.8264</td>\n",
       "      <td>2.5271</td>\n",
       "      <td>1.9665</td>\n",
       "      <td>-0.32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>000069.XSHE</td>\n",
       "      <td>477.3444</td>\n",
       "      <td>1.471469e+11</td>\n",
       "      <td>-1.244737e+11</td>\n",
       "      <td>4.345297</td>\n",
       "      <td>0.196149</td>\n",
       "      <td>0.170275</td>\n",
       "      <td>-12.56</td>\n",
       "      <td>1.2700</td>\n",
       "      <td>15.5290</td>\n",
       "      <td>0.6153</td>\n",
       "      <td>0.4702</td>\n",
       "      <td>0.09</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          code  market_cap        anon_1  ...   pb_ratio  ps_ratio   roa\n",
       "0  000001.XSHE   2811.9175           NaN  ...     0.8363    1.6179  0.26\n",
       "1  000002.XSHE   2151.8584  3.048454e+11  ...     0.9085    0.4748  0.14\n",
       "2  000063.XSHE   1192.6785  5.743872e+10  ...     2.2699    1.0263  1.17\n",
       "3  000066.XSHE    344.1928  9.296436e+09  ...     2.5271    1.9665 -0.32\n",
       "4  000069.XSHE    477.3444  1.471469e+11  ...     0.6153    0.4702  0.09\n",
       "\n",
       "[5 rows x 13 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 创建query对象，指定获取股票的代码、市值、净运营资本\n",
    "# 净债务、产权比率、股东权益比率、营收增长率、换手率、\n",
    "# 市盈率（PE）、市净率（PB）、市销率（PS）、总资产收益率因子\n",
    "q = query(valuation.code, valuation.market_cap,\n",
    "          balance.total_current_assets - balance.total_current_liability,\n",
    "          balance.total_liability - balance.total_assets,\n",
    "          balance.total_liability / balance.equities_parent_company_owners,\n",
    "          (balance.total_assets - balance.total_current_assets) / balance.total_assets,\n",
    "          balance.equities_parent_company_owners / balance.total_assets,\n",
    "          indicator.inc_total_revenue_year_on_year,\n",
    "          valuation.turnover_ratio,\n",
    "          valuation.pe_ratio,\n",
    "          valuation.pb_ratio,\n",
    "          valuation.ps_ratio, indicator.roa).filter(\n",
    "    valuation.code.in_(hs300_stock_lst))\n",
    "# 将获得的因子值存入一个数据表\n",
    "df = get_fundamentals(q, date=None)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>市值</th>\n",
       "      <th>净营运资本</th>\n",
       "      <th>净债务</th>\n",
       "      <th>产权比率</th>\n",
       "      <th>非流动资产比率</th>\n",
       "      <th>股东权益比率</th>\n",
       "      <th>营收增长率</th>\n",
       "      <th>换手率</th>\n",
       "      <th>PE</th>\n",
       "      <th>PB</th>\n",
       "      <th>PS</th>\n",
       "      <th>总资产收益率</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>000001.XSHE</th>\n",
       "      <td>2811.9175</td>\n",
       "      <td>NaN</td>\n",
       "      <td>-4.061748e+11</td>\n",
       "      <td>11.600355</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.079363</td>\n",
       "      <td>10.57</td>\n",
       "      <td>0.7716</td>\n",
       "      <td>7.2001</td>\n",
       "      <td>0.8363</td>\n",
       "      <td>1.6179</td>\n",
       "      <td>0.26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000002.XSHE</th>\n",
       "      <td>2151.8584</td>\n",
       "      <td>3.048454e+11</td>\n",
       "      <td>-3.933097e+11</td>\n",
       "      <td>6.560801</td>\n",
       "      <td>0.173873</td>\n",
       "      <td>0.121636</td>\n",
       "      <td>0.65</td>\n",
       "      <td>0.7211</td>\n",
       "      <td>9.4960</td>\n",
       "      <td>0.9085</td>\n",
       "      <td>0.4748</td>\n",
       "      <td>0.14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000063.XSHE</th>\n",
       "      <td>1192.6785</td>\n",
       "      <td>5.743872e+10</td>\n",
       "      <td>-5.531577e+10</td>\n",
       "      <td>2.306207</td>\n",
       "      <td>0.226818</td>\n",
       "      <td>0.299797</td>\n",
       "      <td>6.43</td>\n",
       "      <td>1.0023</td>\n",
       "      <td>17.4166</td>\n",
       "      <td>2.2699</td>\n",
       "      <td>1.0263</td>\n",
       "      <td>1.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000066.XSHE</th>\n",
       "      <td>344.1928</td>\n",
       "      <td>9.296436e+09</td>\n",
       "      <td>-1.468575e+10</td>\n",
       "      <td>1.436590</td>\n",
       "      <td>0.370459</td>\n",
       "      <td>0.397642</td>\n",
       "      <td>-9.24</td>\n",
       "      <td>0.8867</td>\n",
       "      <td>53.8264</td>\n",
       "      <td>2.5271</td>\n",
       "      <td>1.9665</td>\n",
       "      <td>-0.32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000069.XSHE</th>\n",
       "      <td>477.3444</td>\n",
       "      <td>1.471469e+11</td>\n",
       "      <td>-1.244737e+11</td>\n",
       "      <td>4.345297</td>\n",
       "      <td>0.196149</td>\n",
       "      <td>0.170275</td>\n",
       "      <td>-12.56</td>\n",
       "      <td>1.2700</td>\n",
       "      <td>15.5290</td>\n",
       "      <td>0.6153</td>\n",
       "      <td>0.4702</td>\n",
       "      <td>0.09</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    市值         净营运资本   ...        PS  总资产收益率\n",
       "000001.XSHE  2811.9175           NaN   ...    1.6179    0.26\n",
       "000002.XSHE  2151.8584  3.048454e+11   ...    0.4748    0.14\n",
       "000063.XSHE  1192.6785  5.743872e+10   ...    1.0263    1.17\n",
       "000066.XSHE   344.1928  9.296436e+09   ...    1.9665   -0.32\n",
       "000069.XSHE   477.3444  1.471469e+11   ...    0.4702    0.09\n",
       "\n",
       "[5 rows x 12 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 把数据表的字段名指定为对应的因子名\n",
    "df.columns = ['code', '市值', '净营运资本',\n",
    "              '净债务', '产权比率', '非流动资产比率',\n",
    "              '股东权益比率', '营收增长率'\n",
    "    , '换手率', 'PE', 'PB', 'PS', '总资产收益率']\n",
    "# 将股票代码作为数据表的index\n",
    "df.index = df.code.values\n",
    "# 使用del也可以删除列\n",
    "del df['code']\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(datetime.date(2022, 6, 27), datetime.date(2022, 5, 9))"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 主要时间点设定\n",
    "today = datetime.date.today()\n",
    "# 设定2个时间差，分别是50天，1天\n",
    "delta50 = datetime.timedelta(days=50)\n",
    "delta1 = datetime.timedelta(days=1)\n",
    "# 50日前作为一个历史节点\n",
    "history_50ds = today - delta50\n",
    "# 再计算昨天的日期\n",
    "yesterday = today - delta1\n",
    "yesterday, history_50ds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.2 获取最新的技术因子"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>市值</th>\n",
       "      <th>净营运资本</th>\n",
       "      <th>净债务</th>\n",
       "      <th>产权比率</th>\n",
       "      <th>非流动资产比率</th>\n",
       "      <th>股东权益比率</th>\n",
       "      <th>营收增长率</th>\n",
       "      <th>换手率</th>\n",
       "      <th>PE</th>\n",
       "      <th>PB</th>\n",
       "      <th>PS</th>\n",
       "      <th>总资产收益率</th>\n",
       "      <th>动量线</th>\n",
       "      <th>成交量</th>\n",
       "      <th>累计能量线</th>\n",
       "      <th>平均差</th>\n",
       "      <th>指数移动平均</th>\n",
       "      <th>移动平均</th>\n",
       "      <th>乖离率</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>000001.XSHE</th>\n",
       "      <td>2811.9175</td>\n",
       "      <td>NaN</td>\n",
       "      <td>-4.061748e+11</td>\n",
       "      <td>11.600355</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.079363</td>\n",
       "      <td>10.57</td>\n",
       "      <td>0.7716</td>\n",
       "      <td>7.2001</td>\n",
       "      <td>0.8363</td>\n",
       "      <td>1.6179</td>\n",
       "      <td>0.26</td>\n",
       "      <td>0.58</td>\n",
       "      <td>1497278.18</td>\n",
       "      <td>668914281.0</td>\n",
       "      <td>-0.3986</td>\n",
       "      <td>14.308895</td>\n",
       "      <td>14.346</td>\n",
       "      <td>1.003764</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000002.XSHE</th>\n",
       "      <td>2151.8584</td>\n",
       "      <td>3.048454e+11</td>\n",
       "      <td>-3.933097e+11</td>\n",
       "      <td>6.560801</td>\n",
       "      <td>0.173873</td>\n",
       "      <td>0.121636</td>\n",
       "      <td>0.65</td>\n",
       "      <td>0.7211</td>\n",
       "      <td>9.4960</td>\n",
       "      <td>0.9085</td>\n",
       "      <td>0.4748</td>\n",
       "      <td>0.14</td>\n",
       "      <td>1.04</td>\n",
       "      <td>700741.92</td>\n",
       "      <td>192617757.0</td>\n",
       "      <td>-0.3240</td>\n",
       "      <td>18.316412</td>\n",
       "      <td>18.330</td>\n",
       "      <td>0.981997</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000063.XSHE</th>\n",
       "      <td>1192.6785</td>\n",
       "      <td>5.743872e+10</td>\n",
       "      <td>-5.531577e+10</td>\n",
       "      <td>2.306207</td>\n",
       "      <td>0.226818</td>\n",
       "      <td>0.299797</td>\n",
       "      <td>6.43</td>\n",
       "      <td>1.0023</td>\n",
       "      <td>17.4166</td>\n",
       "      <td>2.2699</td>\n",
       "      <td>1.0263</td>\n",
       "      <td>1.17</td>\n",
       "      <td>0.49</td>\n",
       "      <td>390225.97</td>\n",
       "      <td>57165038.0</td>\n",
       "      <td>1.5648</td>\n",
       "      <td>25.099482</td>\n",
       "      <td>25.285</td>\n",
       "      <td>-0.375717</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000066.XSHE</th>\n",
       "      <td>344.1928</td>\n",
       "      <td>9.296436e+09</td>\n",
       "      <td>-1.468575e+10</td>\n",
       "      <td>1.436590</td>\n",
       "      <td>0.370459</td>\n",
       "      <td>0.397642</td>\n",
       "      <td>-9.24</td>\n",
       "      <td>0.8867</td>\n",
       "      <td>53.8264</td>\n",
       "      <td>2.5271</td>\n",
       "      <td>1.9665</td>\n",
       "      <td>-0.32</td>\n",
       "      <td>-0.09</td>\n",
       "      <td>260702.03</td>\n",
       "      <td>55492859.0</td>\n",
       "      <td>0.5460</td>\n",
       "      <td>10.561782</td>\n",
       "      <td>10.581</td>\n",
       "      <td>0.841130</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000069.XSHE</th>\n",
       "      <td>477.3444</td>\n",
       "      <td>1.471469e+11</td>\n",
       "      <td>-1.244737e+11</td>\n",
       "      <td>4.345297</td>\n",
       "      <td>0.196149</td>\n",
       "      <td>0.170275</td>\n",
       "      <td>-12.56</td>\n",
       "      <td>1.2700</td>\n",
       "      <td>15.5290</td>\n",
       "      <td>0.6153</td>\n",
       "      <td>0.4702</td>\n",
       "      <td>0.09</td>\n",
       "      <td>0.48</td>\n",
       "      <td>896021.57</td>\n",
       "      <td>418744306.0</td>\n",
       "      <td>-0.2518</td>\n",
       "      <td>5.657840</td>\n",
       "      <td>5.630</td>\n",
       "      <td>3.374778</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    市值         净营运资本    ...       移动平均       乖离率\n",
       "000001.XSHE  2811.9175           NaN    ...     14.346  1.003764\n",
       "000002.XSHE  2151.8584  3.048454e+11    ...     18.330  0.981997\n",
       "000063.XSHE  1192.6785  5.743872e+10    ...     25.285 -0.375717\n",
       "000066.XSHE   344.1928  9.296436e+09    ...     10.581  0.841130\n",
       "000069.XSHE   477.3444  1.471469e+11    ...      5.630  3.374778\n",
       "\n",
       "[5 rows x 19 columns]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 下面就获取股票的动量线、成交量、累计能量线、平均差、\n",
    "# 指数移动平均、移动平均、乖离率等因子\n",
    "# 时间范围都设为10天\n",
    "df['动量线'] = list(MTM(df.index, yesterday,\n",
    "                     timeperiod=10, unit='1d',\n",
    "                     include_now=True,\n",
    "                     fq_ref_date=None).values())\n",
    "df['成交量'] = list(VOL(df.index, yesterday, M1=10,\n",
    "                     unit='1d', include_now=True,\n",
    "                     fq_ref_date=None)[0].values())\n",
    "df['累计能量线'] = list(OBV(df.index, check_date=yesterday,\n",
    "                       timeperiod=10).values())\n",
    "df['平均差'] = list(DMA(df.index, yesterday, N1=10,\n",
    "                     unit='1d', include_now=True,\n",
    "                     fq_ref_date=None)[0].values())\n",
    "df['指数移动平均'] = list(EMA(df.index, yesterday, timeperiod=10,\n",
    "                        unit='1d', include_now=True,\n",
    "                        fq_ref_date=None).values())\n",
    "df['移动平均'] = list(MA(df.index, yesterday, timeperiod=10,\n",
    "                     unit='1d', include_now=True,\n",
    "                     fq_ref_date=None).values())\n",
    "df['乖离率'] = list(BIAS(df.index, yesterday, N1=10,\n",
    "                      unit='1d', include_now=True,\n",
    "                      fq_ref_date=None)[0].values())\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>市值</th>\n",
       "      <th>净营运资本</th>\n",
       "      <th>净债务</th>\n",
       "      <th>产权比率</th>\n",
       "      <th>非流动资产比率</th>\n",
       "      <th>股东权益比率</th>\n",
       "      <th>营收增长率</th>\n",
       "      <th>换手率</th>\n",
       "      <th>PE</th>\n",
       "      <th>PB</th>\n",
       "      <th>PS</th>\n",
       "      <th>总资产收益率</th>\n",
       "      <th>动量线</th>\n",
       "      <th>成交量</th>\n",
       "      <th>累计能量线</th>\n",
       "      <th>平均差</th>\n",
       "      <th>指数移动平均</th>\n",
       "      <th>移动平均</th>\n",
       "      <th>乖离率</th>\n",
       "      <th>close1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>000001.XSHE</th>\n",
       "      <td>2811.9175</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>-4.061748e+11</td>\n",
       "      <td>11.600355</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.079363</td>\n",
       "      <td>10.57</td>\n",
       "      <td>0.7716</td>\n",
       "      <td>7.2001</td>\n",
       "      <td>0.8363</td>\n",
       "      <td>1.6179</td>\n",
       "      <td>0.26</td>\n",
       "      <td>0.58</td>\n",
       "      <td>1497278.18</td>\n",
       "      <td>668914281.0</td>\n",
       "      <td>-0.3986</td>\n",
       "      <td>14.308895</td>\n",
       "      <td>14.346</td>\n",
       "      <td>1.003764</td>\n",
       "      <td>14.49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000002.XSHE</th>\n",
       "      <td>2151.8584</td>\n",
       "      <td>3.048454e+11</td>\n",
       "      <td>-3.933097e+11</td>\n",
       "      <td>6.560801</td>\n",
       "      <td>0.173873</td>\n",
       "      <td>0.121636</td>\n",
       "      <td>0.65</td>\n",
       "      <td>0.7211</td>\n",
       "      <td>9.4960</td>\n",
       "      <td>0.9085</td>\n",
       "      <td>0.4748</td>\n",
       "      <td>0.14</td>\n",
       "      <td>1.04</td>\n",
       "      <td>700741.92</td>\n",
       "      <td>192617757.0</td>\n",
       "      <td>-0.3240</td>\n",
       "      <td>18.316412</td>\n",
       "      <td>18.330</td>\n",
       "      <td>0.981997</td>\n",
       "      <td>18.51</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000063.XSHE</th>\n",
       "      <td>1192.6785</td>\n",
       "      <td>5.743872e+10</td>\n",
       "      <td>-5.531577e+10</td>\n",
       "      <td>2.306207</td>\n",
       "      <td>0.226818</td>\n",
       "      <td>0.299797</td>\n",
       "      <td>6.43</td>\n",
       "      <td>1.0023</td>\n",
       "      <td>17.4166</td>\n",
       "      <td>2.2699</td>\n",
       "      <td>1.0263</td>\n",
       "      <td>1.17</td>\n",
       "      <td>0.49</td>\n",
       "      <td>390225.97</td>\n",
       "      <td>57165038.0</td>\n",
       "      <td>1.5648</td>\n",
       "      <td>25.099482</td>\n",
       "      <td>25.285</td>\n",
       "      <td>-0.375717</td>\n",
       "      <td>25.19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000066.XSHE</th>\n",
       "      <td>344.1928</td>\n",
       "      <td>9.296436e+09</td>\n",
       "      <td>-1.468575e+10</td>\n",
       "      <td>1.436590</td>\n",
       "      <td>0.370459</td>\n",
       "      <td>0.397642</td>\n",
       "      <td>-9.24</td>\n",
       "      <td>0.8867</td>\n",
       "      <td>53.8264</td>\n",
       "      <td>2.5271</td>\n",
       "      <td>1.9665</td>\n",
       "      <td>-0.32</td>\n",
       "      <td>-0.09</td>\n",
       "      <td>260702.03</td>\n",
       "      <td>55492859.0</td>\n",
       "      <td>0.5460</td>\n",
       "      <td>10.561782</td>\n",
       "      <td>10.581</td>\n",
       "      <td>0.841130</td>\n",
       "      <td>10.67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000069.XSHE</th>\n",
       "      <td>477.3444</td>\n",
       "      <td>1.471469e+11</td>\n",
       "      <td>-1.244737e+11</td>\n",
       "      <td>4.345297</td>\n",
       "      <td>0.196149</td>\n",
       "      <td>0.170275</td>\n",
       "      <td>-12.56</td>\n",
       "      <td>1.2700</td>\n",
       "      <td>15.5290</td>\n",
       "      <td>0.6153</td>\n",
       "      <td>0.4702</td>\n",
       "      <td>0.09</td>\n",
       "      <td>0.48</td>\n",
       "      <td>896021.57</td>\n",
       "      <td>418744306.0</td>\n",
       "      <td>-0.2518</td>\n",
       "      <td>5.657840</td>\n",
       "      <td>5.630</td>\n",
       "      <td>3.374778</td>\n",
       "      <td>5.82</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    市值         净营运资本   ...         乖离率  close1\n",
       "000001.XSHE  2811.9175  0.000000e+00   ...    1.003764   14.49\n",
       "000002.XSHE  2151.8584  3.048454e+11   ...    0.981997   18.51\n",
       "000063.XSHE  1192.6785  5.743872e+10   ...   -0.375717   25.19\n",
       "000066.XSHE   344.1928  9.296436e+09   ...    0.841130   10.67\n",
       "000069.XSHE   477.3444  1.471469e+11   ...    3.374778    5.82\n",
       "\n",
       "[5 rows x 20 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 把数据表中的空值用0来代替\n",
    "df.fillna(0, inplace=True)\n",
    "#获取股票前一日的收盘价\n",
    "df['close1']=list(get_price(hs300_stock_lst, \n",
    "                       end_date=yesterday, \n",
    "                       count = 1,\n",
    "                       fq='pre',panel=False)['close'])\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>市值</th>\n",
       "      <th>净营运资本</th>\n",
       "      <th>净债务</th>\n",
       "      <th>产权比率</th>\n",
       "      <th>非流动资产比率</th>\n",
       "      <th>股东权益比率</th>\n",
       "      <th>营收增长率</th>\n",
       "      <th>换手率</th>\n",
       "      <th>PE</th>\n",
       "      <th>PB</th>\n",
       "      <th>PS</th>\n",
       "      <th>总资产收益率</th>\n",
       "      <th>动量线</th>\n",
       "      <th>成交量</th>\n",
       "      <th>累计能量线</th>\n",
       "      <th>平均差</th>\n",
       "      <th>指数移动平均</th>\n",
       "      <th>移动平均</th>\n",
       "      <th>乖离率</th>\n",
       "      <th>close1</th>\n",
       "      <th>close2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>688363.XSHG</th>\n",
       "      <td>678.2821</td>\n",
       "      <td>2.528823e+09</td>\n",
       "      <td>-6.011806e+09</td>\n",
       "      <td>0.233262</td>\n",
       "      <td>0.519220</td>\n",
       "      <td>0.810948</td>\n",
       "      <td>61.57</td>\n",
       "      <td>6.8649</td>\n",
       "      <td>81.7513</td>\n",
       "      <td>11.7413</td>\n",
       "      <td>12.5007</td>\n",
       "      <td>2.65</td>\n",
       "      <td>0.09</td>\n",
       "      <td>65843.71</td>\n",
       "      <td>5451705.0</td>\n",
       "      <td>10.8510</td>\n",
       "      <td>143.272482</td>\n",
       "      <td>143.857</td>\n",
       "      <td>-1.992951</td>\n",
       "      <td>140.99</td>\n",
       "      <td>122.52</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>688396.XSHG</th>\n",
       "      <td>752.4524</td>\n",
       "      <td>1.165588e+10</td>\n",
       "      <td>-1.811441e+10</td>\n",
       "      <td>0.289148</td>\n",
       "      <td>0.329126</td>\n",
       "      <td>0.768991</td>\n",
       "      <td>22.94</td>\n",
       "      <td>1.8395</td>\n",
       "      <td>30.2507</td>\n",
       "      <td>4.2545</td>\n",
       "      <td>7.7426</td>\n",
       "      <td>2.68</td>\n",
       "      <td>-2.02</td>\n",
       "      <td>81142.24</td>\n",
       "      <td>-18673996.0</td>\n",
       "      <td>5.5240</td>\n",
       "      <td>57.125812</td>\n",
       "      <td>57.595</td>\n",
       "      <td>-1.033076</td>\n",
       "      <td>57.00</td>\n",
       "      <td>46.41</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>688561.XSHG</th>\n",
       "      <td>370.6434</td>\n",
       "      <td>4.786264e+09</td>\n",
       "      <td>-9.395928e+09</td>\n",
       "      <td>0.376994</td>\n",
       "      <td>0.398691</td>\n",
       "      <td>0.725508</td>\n",
       "      <td>44.52</td>\n",
       "      <td>0.5683</td>\n",
       "      <td>-74.2944</td>\n",
       "      <td>3.9501</td>\n",
       "      <td>6.1649</td>\n",
       "      <td>-3.65</td>\n",
       "      <td>3.72</td>\n",
       "      <td>26110.62</td>\n",
       "      <td>7636315.0</td>\n",
       "      <td>3.2508</td>\n",
       "      <td>53.349821</td>\n",
       "      <td>53.152</td>\n",
       "      <td>2.235099</td>\n",
       "      <td>54.34</td>\n",
       "      <td>47.19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>688599.XSHG</th>\n",
       "      <td>1462.9047</td>\n",
       "      <td>7.070663e+09</td>\n",
       "      <td>-2.278407e+10</td>\n",
       "      <td>2.332285</td>\n",
       "      <td>0.293511</td>\n",
       "      <td>0.293731</td>\n",
       "      <td>79.20</td>\n",
       "      <td>1.7271</td>\n",
       "      <td>69.0945</td>\n",
       "      <td>7.0865</td>\n",
       "      <td>2.8555</td>\n",
       "      <td>0.85</td>\n",
       "      <td>7.29</td>\n",
       "      <td>228419.06</td>\n",
       "      <td>149939021.0</td>\n",
       "      <td>5.1096</td>\n",
       "      <td>64.319386</td>\n",
       "      <td>62.578</td>\n",
       "      <td>7.849404</td>\n",
       "      <td>67.49</td>\n",
       "      <td>52.80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>688981.XSHG</th>\n",
       "      <td>3575.3213</td>\n",
       "      <td>7.224635e+10</td>\n",
       "      <td>-1.672907e+11</td>\n",
       "      <td>0.630879</td>\n",
       "      <td>0.574801</td>\n",
       "      <td>0.471422</td>\n",
       "      <td>62.56</td>\n",
       "      <td>0.8819</td>\n",
       "      <td>28.5013</td>\n",
       "      <td>3.1852</td>\n",
       "      <td>8.8955</td>\n",
       "      <td>1.55</td>\n",
       "      <td>-0.54</td>\n",
       "      <td>165006.97</td>\n",
       "      <td>29096119.0</td>\n",
       "      <td>2.1994</td>\n",
       "      <td>44.927543</td>\n",
       "      <td>45.083</td>\n",
       "      <td>0.259521</td>\n",
       "      <td>45.20</td>\n",
       "      <td>39.93</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    市值         净营运资本   ...    close1  close2\n",
       "688363.XSHG   678.2821  2.528823e+09   ...    140.99  122.52\n",
       "688396.XSHG   752.4524  1.165588e+10   ...     57.00   46.41\n",
       "688561.XSHG   370.6434  4.786264e+09   ...     54.34   47.19\n",
       "688599.XSHG  1462.9047  7.070663e+09   ...     67.49   52.80\n",
       "688981.XSHG  3575.3213  7.224635e+10   ...     45.20   39.93\n",
       "\n",
       "[5 rows x 21 columns]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#获取股票50日前的收盘价\n",
    "df['close2']=list(get_price(hs300_stock_lst,  \n",
    "                       end_date=history_50ds, \n",
    "                       count = 1,\n",
    "                       fq ='pre',panel=False)['close'])\n",
    "df.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>市值</th>\n",
       "      <th>净营运资本</th>\n",
       "      <th>净债务</th>\n",
       "      <th>产权比率</th>\n",
       "      <th>非流动资产比率</th>\n",
       "      <th>股东权益比率</th>\n",
       "      <th>营收增长率</th>\n",
       "      <th>换手率</th>\n",
       "      <th>PE</th>\n",
       "      <th>PB</th>\n",
       "      <th>PS</th>\n",
       "      <th>总资产收益率</th>\n",
       "      <th>动量线</th>\n",
       "      <th>成交量</th>\n",
       "      <th>累计能量线</th>\n",
       "      <th>平均差</th>\n",
       "      <th>指数移动平均</th>\n",
       "      <th>移动平均</th>\n",
       "      <th>乖离率</th>\n",
       "      <th>close1</th>\n",
       "      <th>close2</th>\n",
       "      <th>return</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>000001.XSHE</th>\n",
       "      <td>2811.9175</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>-4.061748e+11</td>\n",
       "      <td>11.600355</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.079363</td>\n",
       "      <td>10.57</td>\n",
       "      <td>0.7716</td>\n",
       "      <td>7.2001</td>\n",
       "      <td>0.8363</td>\n",
       "      <td>1.6179</td>\n",
       "      <td>0.26</td>\n",
       "      <td>0.58</td>\n",
       "      <td>1497278.18</td>\n",
       "      <td>668914281.0</td>\n",
       "      <td>-0.3986</td>\n",
       "      <td>14.308895</td>\n",
       "      <td>14.346</td>\n",
       "      <td>1.003764</td>\n",
       "      <td>14.49</td>\n",
       "      <td>14.55</td>\n",
       "      <td>-0.004124</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000002.XSHE</th>\n",
       "      <td>2151.8584</td>\n",
       "      <td>3.048454e+11</td>\n",
       "      <td>-3.933097e+11</td>\n",
       "      <td>6.560801</td>\n",
       "      <td>0.173873</td>\n",
       "      <td>0.121636</td>\n",
       "      <td>0.65</td>\n",
       "      <td>0.7211</td>\n",
       "      <td>9.4960</td>\n",
       "      <td>0.9085</td>\n",
       "      <td>0.4748</td>\n",
       "      <td>0.14</td>\n",
       "      <td>1.04</td>\n",
       "      <td>700741.92</td>\n",
       "      <td>192617757.0</td>\n",
       "      <td>-0.3240</td>\n",
       "      <td>18.316412</td>\n",
       "      <td>18.330</td>\n",
       "      <td>0.981997</td>\n",
       "      <td>18.51</td>\n",
       "      <td>18.48</td>\n",
       "      <td>0.001623</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000063.XSHE</th>\n",
       "      <td>1192.6785</td>\n",
       "      <td>5.743872e+10</td>\n",
       "      <td>-5.531577e+10</td>\n",
       "      <td>2.306207</td>\n",
       "      <td>0.226818</td>\n",
       "      <td>0.299797</td>\n",
       "      <td>6.43</td>\n",
       "      <td>1.0023</td>\n",
       "      <td>17.4166</td>\n",
       "      <td>2.2699</td>\n",
       "      <td>1.0263</td>\n",
       "      <td>1.17</td>\n",
       "      <td>0.49</td>\n",
       "      <td>390225.97</td>\n",
       "      <td>57165038.0</td>\n",
       "      <td>1.5648</td>\n",
       "      <td>25.099482</td>\n",
       "      <td>25.285</td>\n",
       "      <td>-0.375717</td>\n",
       "      <td>25.19</td>\n",
       "      <td>22.86</td>\n",
       "      <td>0.101925</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000066.XSHE</th>\n",
       "      <td>344.1928</td>\n",
       "      <td>9.296436e+09</td>\n",
       "      <td>-1.468575e+10</td>\n",
       "      <td>1.436590</td>\n",
       "      <td>0.370459</td>\n",
       "      <td>0.397642</td>\n",
       "      <td>-9.24</td>\n",
       "      <td>0.8867</td>\n",
       "      <td>53.8264</td>\n",
       "      <td>2.5271</td>\n",
       "      <td>1.9665</td>\n",
       "      <td>-0.32</td>\n",
       "      <td>-0.09</td>\n",
       "      <td>260702.03</td>\n",
       "      <td>55492859.0</td>\n",
       "      <td>0.5460</td>\n",
       "      <td>10.561782</td>\n",
       "      <td>10.581</td>\n",
       "      <td>0.841130</td>\n",
       "      <td>10.67</td>\n",
       "      <td>9.79</td>\n",
       "      <td>0.089888</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000069.XSHE</th>\n",
       "      <td>477.3444</td>\n",
       "      <td>1.471469e+11</td>\n",
       "      <td>-1.244737e+11</td>\n",
       "      <td>4.345297</td>\n",
       "      <td>0.196149</td>\n",
       "      <td>0.170275</td>\n",
       "      <td>-12.56</td>\n",
       "      <td>1.2700</td>\n",
       "      <td>15.5290</td>\n",
       "      <td>0.6153</td>\n",
       "      <td>0.4702</td>\n",
       "      <td>0.09</td>\n",
       "      <td>0.48</td>\n",
       "      <td>896021.57</td>\n",
       "      <td>418744306.0</td>\n",
       "      <td>-0.2518</td>\n",
       "      <td>5.657840</td>\n",
       "      <td>5.630</td>\n",
       "      <td>3.374778</td>\n",
       "      <td>5.82</td>\n",
       "      <td>5.64</td>\n",
       "      <td>0.031915</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    市值         净营运资本    ...     close2    return\n",
       "000001.XSHE  2811.9175  0.000000e+00    ...      14.55 -0.004124\n",
       "000002.XSHE  2151.8584  3.048454e+11    ...      18.48  0.001623\n",
       "000063.XSHE  1192.6785  5.743872e+10    ...      22.86  0.101925\n",
       "000066.XSHE   344.1928  9.296436e+09    ...       9.79  0.089888\n",
       "000069.XSHE   477.3444  1.471469e+11    ...       5.64  0.031915\n",
       "\n",
       "[5 rows x 22 columns]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 把数据表中的空值用0来代替\n",
    "df.fillna(0, inplace=True)\n",
    "# 计算出收益, 昨天收盘价相对50天前涨跌幅\n",
    "df['return'] = df['close1'] / df['close2'] - 1\n",
    "#检查是否成功\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>市值</th>\n",
       "      <th>净营运资本</th>\n",
       "      <th>净债务</th>\n",
       "      <th>产权比率</th>\n",
       "      <th>非流动资产比率</th>\n",
       "      <th>股东权益比率</th>\n",
       "      <th>营收增长率</th>\n",
       "      <th>换手率</th>\n",
       "      <th>PE</th>\n",
       "      <th>PB</th>\n",
       "      <th>PS</th>\n",
       "      <th>总资产收益率</th>\n",
       "      <th>动量线</th>\n",
       "      <th>成交量</th>\n",
       "      <th>累计能量线</th>\n",
       "      <th>平均差</th>\n",
       "      <th>指数移动平均</th>\n",
       "      <th>移动平均</th>\n",
       "      <th>乖离率</th>\n",
       "      <th>close1</th>\n",
       "      <th>close2</th>\n",
       "      <th>return</th>\n",
       "      <th>signal</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>000001.XSHE</th>\n",
       "      <td>2811.9175</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>-4.061748e+11</td>\n",
       "      <td>11.600355</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.079363</td>\n",
       "      <td>10.57</td>\n",
       "      <td>0.7716</td>\n",
       "      <td>7.2001</td>\n",
       "      <td>0.8363</td>\n",
       "      <td>1.6179</td>\n",
       "      <td>0.26</td>\n",
       "      <td>0.58</td>\n",
       "      <td>1497278.18</td>\n",
       "      <td>668914281.0</td>\n",
       "      <td>-0.3986</td>\n",
       "      <td>14.308895</td>\n",
       "      <td>14.346</td>\n",
       "      <td>1.003764</td>\n",
       "      <td>14.49</td>\n",
       "      <td>14.55</td>\n",
       "      <td>-0.004124</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000002.XSHE</th>\n",
       "      <td>2151.8584</td>\n",
       "      <td>3.048454e+11</td>\n",
       "      <td>-3.933097e+11</td>\n",
       "      <td>6.560801</td>\n",
       "      <td>0.173873</td>\n",
       "      <td>0.121636</td>\n",
       "      <td>0.65</td>\n",
       "      <td>0.7211</td>\n",
       "      <td>9.4960</td>\n",
       "      <td>0.9085</td>\n",
       "      <td>0.4748</td>\n",
       "      <td>0.14</td>\n",
       "      <td>1.04</td>\n",
       "      <td>700741.92</td>\n",
       "      <td>192617757.0</td>\n",
       "      <td>-0.3240</td>\n",
       "      <td>18.316412</td>\n",
       "      <td>18.330</td>\n",
       "      <td>0.981997</td>\n",
       "      <td>18.51</td>\n",
       "      <td>18.48</td>\n",
       "      <td>0.001623</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000063.XSHE</th>\n",
       "      <td>1192.6785</td>\n",
       "      <td>5.743872e+10</td>\n",
       "      <td>-5.531577e+10</td>\n",
       "      <td>2.306207</td>\n",
       "      <td>0.226818</td>\n",
       "      <td>0.299797</td>\n",
       "      <td>6.43</td>\n",
       "      <td>1.0023</td>\n",
       "      <td>17.4166</td>\n",
       "      <td>2.2699</td>\n",
       "      <td>1.0263</td>\n",
       "      <td>1.17</td>\n",
       "      <td>0.49</td>\n",
       "      <td>390225.97</td>\n",
       "      <td>57165038.0</td>\n",
       "      <td>1.5648</td>\n",
       "      <td>25.099482</td>\n",
       "      <td>25.285</td>\n",
       "      <td>-0.375717</td>\n",
       "      <td>25.19</td>\n",
       "      <td>22.86</td>\n",
       "      <td>0.101925</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000066.XSHE</th>\n",
       "      <td>344.1928</td>\n",
       "      <td>9.296436e+09</td>\n",
       "      <td>-1.468575e+10</td>\n",
       "      <td>1.436590</td>\n",
       "      <td>0.370459</td>\n",
       "      <td>0.397642</td>\n",
       "      <td>-9.24</td>\n",
       "      <td>0.8867</td>\n",
       "      <td>53.8264</td>\n",
       "      <td>2.5271</td>\n",
       "      <td>1.9665</td>\n",
       "      <td>-0.32</td>\n",
       "      <td>-0.09</td>\n",
       "      <td>260702.03</td>\n",
       "      <td>55492859.0</td>\n",
       "      <td>0.5460</td>\n",
       "      <td>10.561782</td>\n",
       "      <td>10.581</td>\n",
       "      <td>0.841130</td>\n",
       "      <td>10.67</td>\n",
       "      <td>9.79</td>\n",
       "      <td>0.089888</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000069.XSHE</th>\n",
       "      <td>477.3444</td>\n",
       "      <td>1.471469e+11</td>\n",
       "      <td>-1.244737e+11</td>\n",
       "      <td>4.345297</td>\n",
       "      <td>0.196149</td>\n",
       "      <td>0.170275</td>\n",
       "      <td>-12.56</td>\n",
       "      <td>1.2700</td>\n",
       "      <td>15.5290</td>\n",
       "      <td>0.6153</td>\n",
       "      <td>0.4702</td>\n",
       "      <td>0.09</td>\n",
       "      <td>0.48</td>\n",
       "      <td>896021.57</td>\n",
       "      <td>418744306.0</td>\n",
       "      <td>-0.2518</td>\n",
       "      <td>5.657840</td>\n",
       "      <td>5.630</td>\n",
       "      <td>3.374778</td>\n",
       "      <td>5.82</td>\n",
       "      <td>5.64</td>\n",
       "      <td>0.031915</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    市值         净营运资本   ...      return  signal\n",
       "000001.XSHE  2811.9175  0.000000e+00   ...   -0.004124       0\n",
       "000002.XSHE  2151.8584  3.048454e+11   ...    0.001623       0\n",
       "000063.XSHE  1192.6785  5.743872e+10   ...    0.101925       0\n",
       "000066.XSHE   344.1928  9.296436e+09   ...    0.089888       0\n",
       "000069.XSHE   477.3444  1.471469e+11   ...    0.031915       0\n",
       "\n",
       "[5 rows x 23 columns]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 如果收益大于平均水平，则标记为1, 否则标记为0\n",
    "df['signal']=np.where(df['return']<df['return'].mean(),0,1)\n",
    "#检查是否成功\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.3 机器学习数据准备"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 把因子值作为样本的特征，所以要去掉刚刚添加的几个字段\n",
    "x = df.drop(['close1', 'close2', 'return', 'signal'], axis=1)\n",
    "# 把signal作为分类标签\n",
    "y = df['signal']\n",
    "x = x.fillna(0)\n",
    "y = y.fillna(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. 机器学习"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.1 决策树特征筛选"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "#导入数据集拆分工具\n",
    "from sklearn.model_selection import train_test_split\n",
    "#导入决策树分类器\n",
    "from sklearn.tree import DecisionTreeClassifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.0\n",
      "0.8\n"
     ]
    }
   ],
   "source": [
    "#将数据拆分为训练集和验证集\n",
    "X_train,X_test,y_train,y_test=\\\n",
    "train_test_split(x,y,test_size = 0.3)\n",
    "#创建决策树分类器实例，指定random_state便于复现\n",
    "clf = DecisionTreeClassifier(random_state=1000)\n",
    "#拟合训练集数据\n",
    "clf.fit(X_train, y_train)\n",
    "#查看分类器在训练集和验证集中的准确率\n",
    "print(clf.score(X_train, y_train))\n",
    "print(clf.score(X_test, y_test))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>features</th>\n",
       "      <th>importance</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>平均差</td>\n",
       "      <td>0.578145</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>营收增长率</td>\n",
       "      <td>0.085790</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>指数移动平均</td>\n",
       "      <td>0.084030</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>换手率</td>\n",
       "      <td>0.065497</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>乖离率</td>\n",
       "      <td>0.052255</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>净营运资本</td>\n",
       "      <td>0.051007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>PE</td>\n",
       "      <td>0.025057</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>成交量</td>\n",
       "      <td>0.019973</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>PB</td>\n",
       "      <td>0.013911</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>PS</td>\n",
       "      <td>0.010433</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>动量线</td>\n",
       "      <td>0.009989</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>市值</td>\n",
       "      <td>0.003911</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>股东权益比率</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>总资产收益率</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>非流动资产比率</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>累计能量线</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>产权比率</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>净债务</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>移动平均</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   features  importance\n",
       "15      平均差    0.578145\n",
       "6     营收增长率    0.085790\n",
       "16   指数移动平均    0.084030\n",
       "7       换手率    0.065497\n",
       "18      乖离率    0.052255\n",
       "1     净营运资本    0.051007\n",
       "8        PE    0.025057\n",
       "13      成交量    0.019973\n",
       "9        PB    0.013911\n",
       "10       PS    0.010433\n",
       "12      动量线    0.009989\n",
       "0        市值    0.003911\n",
       "5    股东权益比率    0.000000\n",
       "11   总资产收益率    0.000000\n",
       "4   非流动资产比率    0.000000\n",
       "14    累计能量线    0.000000\n",
       "3      产权比率    0.000000\n",
       "2       净债务    0.000000\n",
       "17     移动平均    0.000000"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 根据重要性，进行特征筛选。输出2列，因子名和重要性\n",
    "factor_weight = pd.DataFrame({'features':list(x.columns),\n",
    "                             'importance':clf.feature_importances_}).sort_values(\n",
    "    #这里根据重要程度降序排列，一遍遍找到重要性最高的特征\n",
    "    by='importance', ascending = False)\n",
    "#检查结果\n",
    "factor_weight"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.2 随机森林--特征重要性筛选"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.0\n",
      "0.8333333333333334\n"
     ]
    }
   ],
   "source": [
    "from sklearn.ensemble import RandomForestClassifier\n",
    "#创建随机森林分类器实例，指定random_state便于复现\n",
    "# n_estimators约大，性能越好，但运行时间也越长\n",
    "forest = RandomForestClassifier(n_estimators=1000, random_state=0, n_jobs=-1)\n",
    "#拟合训练集数据\n",
    "forest.fit(X_train, y_train)\n",
    "#查看分类器在训练集和验证集中的准确率\n",
    "print(forest.score(X_train, y_train))\n",
    "print(forest.score(X_test, y_test))\n",
    "# 可以看到测试的准确性比决策树要高很多"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>features</th>\n",
       "      <th>importance</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>平均差</td>\n",
       "      <td>0.290277</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>动量线</td>\n",
       "      <td>0.098558</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>乖离率</td>\n",
       "      <td>0.065764</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>PB</td>\n",
       "      <td>0.059026</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>移动平均</td>\n",
       "      <td>0.054337</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>指数移动平均</td>\n",
       "      <td>0.047897</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>换手率</td>\n",
       "      <td>0.044507</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>营收增长率</td>\n",
       "      <td>0.044127</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>PE</td>\n",
       "      <td>0.040370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>PS</td>\n",
       "      <td>0.034475</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>成交量</td>\n",
       "      <td>0.030955</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>累计能量线</td>\n",
       "      <td>0.030834</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>净债务</td>\n",
       "      <td>0.028398</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>总资产收益率</td>\n",
       "      <td>0.026873</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>股东权益比率</td>\n",
       "      <td>0.021834</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>非流动资产比率</td>\n",
       "      <td>0.021411</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>市值</td>\n",
       "      <td>0.020534</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>产权比率</td>\n",
       "      <td>0.020058</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>净营运资本</td>\n",
       "      <td>0.019764</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   features  importance\n",
       "15      平均差    0.290277\n",
       "12      动量线    0.098558\n",
       "18      乖离率    0.065764\n",
       "9        PB    0.059026\n",
       "17     移动平均    0.054337\n",
       "16   指数移动平均    0.047897\n",
       "7       换手率    0.044507\n",
       "6     营收增长率    0.044127\n",
       "8        PE    0.040370\n",
       "10       PS    0.034475\n",
       "13      成交量    0.030955\n",
       "14    累计能量线    0.030834\n",
       "2       净债务    0.028398\n",
       "11   总资产收益率    0.026873\n",
       "5    股东权益比率    0.021834\n",
       "4   非流动资产比率    0.021411\n",
       "0        市值    0.020534\n",
       "3      产权比率    0.020058\n",
       "1     净营运资本    0.019764"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 根据重要性，进行特征筛选。输出2列，因子名和重要性\n",
    "factor_weight = pd.DataFrame({'features':list(x.columns),\n",
    "                             'importance':forest.feature_importances_}).sort_values(\n",
    "    #这里根据重要程度降序排列，一遍遍找到重要性最高的特征\n",
    "    by='importance', ascending = False)\n",
    "#检查结果，可以看到重要性和决策树略有差异\n",
    "factor_weight"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(19, 2)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "factor_weight.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAAEYCAYAAAAJeGK1AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzt3XmYJFWV/vHvS9MNCMjaNJsILuygQonQrLIpgmwOsqiAgKjIiCDyE3UQkNGRGWEQGRRlU5FxBwFBkH1plgYcRFZRFlmbTaEB227O749z08ouqrsis6Kqoqrfz/Pkk5mRGbduZkXGibhx77mKCMzMzJpmvpGugJmZWX8coMzMrJEcoMzMrJEcoMzMrJEcoMzMrJEcoMzMrJEcoMzMrJEcoGzUkPSgpGclPVFuJw2irN3qrFs/5e8j6SlJf5V05FD+rQp1GdLPajZU5IG6NlpIehDYNyKuGmQ5qwOnRsS766jXXP7O0cDKEbHvUP6dAeowLJ/VbCj4DMpGPUnvk3S3pL+UoNBa/nFJf5I0TdKVkpaRtAJwAbBxOQvbS9K+ks5qW29fSVeVx1tI+qWk4yQ9KWlDSfNL+u/y9+6UtGHFej4oaX9Jf5D0jKSDJB0j6TFJj0h6X9t7Xyrvvbd8hve2vbaepJslPS7p4vKZkLSypNslfbKUuU8/n3V1SVPKZ/mTpJ36/M2PS/qjpIf71Gc9STeW9f4gaaGy/Avlc/1R0o4d/uvM5soBykY1SROBs4EdgTWAPSRtVV6+HngnsBy5rR8UEY8CHwNuiIhlI+JHFf7MtsBzETEpIm4EPg6sBqwMHAqcK2n+ilXeGXgbsDvwTWAZYAXgGOCEtvctBKwNrA58FDhH0iKSFgDOB74SEcsBvwbOaVtvDeCtwIoRcXY/n/UR4F8jYhLwNaC9mXQh8vtaFTiwVR9JCwLnAV8v620bES9L2h74ILAW+f2fKWmJit+D2YAcoGy0+UXbNai9gO2AmyPi/oh4AbgI2AYgIu6MiGciYiZwLbDSIP7uf7c9/gBwRkTMjIjLgNcDb6lYztmlPlcB44DvRrazX9VPGd+PdDXwNLABsCHwSkRcUN5zKvAuSSuW5wsAx0fEq/398YiYHhFTy9Mree138t2y7m/b6vMu4B8R8ctSxqNt38M5pcy7gAfLe81q4QBlo82u5WygdUawHLBlK2iRZxuLAZQmtLskPQV8hurbe9/3TYuIWW3PlwO+0/Y3xwNLVyz7GYASpACeLfczyYDV7m9tj58mz7aWA55qLSzlPEeehbWWPTGnPy5pXUmXl3pPAdTnLU+1lduqz3JAf2UuB3yp7Xt4C7DknP62WaeqNkuYNdWTwKURsVP7QklvIpuoNoqI2yV9FVi+vNy3Z9BMZv8tTKrwN7/QOqPoUCe9kpYCHiiPlyGDxywyMAAgaQIZFB6n/wDc9++dTDZ9vgd4E3Bvhfo9BSzbz/IngaMj4sS5fgqzLvkMyka73wAbSVoHQNLSkpYHFgH+Dtxfrots37bOs8AKksZJEvDHUsbrJS0JfHiAv/kL4ONtHQXWrfcj/dN+SluSQehm4EZgvKRdy3s+BdxJXlvqT9/PuihwdzlD+lDFetwELCJpZwBJK5Rrbr8A9i7fGZJWLderzGrhAGWjWkQ8DnyE7ETwBOVaU0TcQXYmeBC4BPhh22p3AncBDwMfLh0ffgPcT16XOXWAP/s/wB3APeVvHt/3DaUH3UHALpI+3+XHe7jU6XTgIxHxYkT8nexo8XlJT5KdE/aIOY8Xme2zAl8Bjpf0Z/Ks69k5rPdPETEd2BU4qjSXXg5MiIhfkR00ppa6nA1M6PKzmr2Gx0GZNZCkIMdQPTTSdTEbKT6DMmuuvh0YzOYpDlBmZtZIbuIzM7NG8hmUmZk1UqVxUJIOBfYAZgB7R8Sfy/IVgO8AiwMLAkdFxK9LF9TTyZQpDwP7RMQrcyp/6aWXjpVXXnkwn8PMzEaJW2+99emImDjQ+wYMUCWFyl7ARsAWZJfaVvr+J4BDIuKBEqyuJHOD7Q48HxEbSToO2B84ZU5/Y+WVV2bq1KlzetnMzMYQSZV6p1Zp4tsKuKwM7LscmNx6ISJmRURrpPvbyTMsgK3JnGgAF5YyzMzMKqvSxDeJzANGRISkVyVNiIgZAJI2Ar5F5iLbq+86wDT6SR0j6UAyYzIrrTSYHJ5mZjYWVTmDGs/s4zFUlgEQEVMiYn1gF+Bf+1lH9DO6PCJOi4ieiOiZOHHApkgzM5vHVAlQT5JJKym5vMaX1CeziYjbgPUkLd2+DnlmNcfsymZmZv2pEqCuAbaVNI68lnSLpGMl7Shp+TKBGpLeTE5z8BxwNb3JOXcoz83MzCob8BpURNwn6Vxy7pgZwD7AEWRW5LWBb0h6gWzG2zciZimnzz5D0o1kluW9h6j+ZmY2RjUik0RPT0+4m7mZ2bxB0q0R0TPQ+0blhIUaRArNBsRjMzOrwKmOzMyskRygzMyskRygzMyskRygzMyskRygzMyskRygzMyskRygzMyskRygzMyskRygzMyskRygzMyskRygzMyskRygzMyskRygzMyskRygzMyskRygzMyskRygzMyskRygzMyskRygzMyskSoFKEmHSrpJ0rWSVmlbPkHSSZJukHS7pK3L8k0lPSbpunJbYag+gJmZjU0DBihJKwJ7ARsDxwDHt16LiBnAJRExGdgDOKG8tBRwZkRsUm6P1l5zMzMb06qcQW0FXBYRM4HLgcntL0bExeXhY8Di5fGSwNN1VdLMzOY9VQLUJEqwiYgAXpU0oZ/3fQC4oO35AZKul3SKpIUGX1UzM5uXVAlQ4wG1PVdZ1rsgmwEPJ5sAiYgzImItYBNgFnBY30IlHShpqqSp06ZN67L6ZmY2VlUJUE+S15SQJGB8RExvvShpAeDHwGci4qn2FcsZ18XAmn0LjYjTIqInInomTpw4iI9gZmZjUZUAdQ2wraRx5PWoWyQdK2nH8vp3gJ9ExG9bK0hapm39TYE76qqwmZnNG+Yf6A0RcZ+kc4EpwAxgH+AIYFFJGwIfAt4iabeyyoHAjuX5K8BDwAFDUXkzMxu7lK1wI6unpyemTp1a+f3SwO+ZkwZ8XDOzeZqkWyOiZ6D3OZOEmZk1kgOUmZk1kgOUmZk1kgOUmZk1kgOUmZk1kgOUmZk1kgOUmZk1kgOUmZk1kgOUmZk1kgOUmZk1kgOUmZk1kgOUmZk1kgOUmZk1kgOUmZk1kgOUmZk1kgOUmZk1kgOUmZk1kgOUmZk1kgOUmZk1kgOUmZk1UqUAJelQSTdJulbSKm3LJ0g6SdINkm6XtHVZPr+ksyVNkfRjSQsO1QcwM7OxacAAJWlFYC9gY+AY4PjWaxExA7gkIiYDewAnlJd2B56PiI2A+4H9a663mZmNcVXOoLYCLouImcDlwOT2FyPi4vLwMWDx8nhr4KLy+MJShpmZWWVVAtQk4GmAiAjgVUkT+nnfB4AL+q4DTCvPZyPpQElTJU2dNm1axxU3M7OxrUqAGg+o7bnKst4F2Qx4ONkE2HcdAa8JaBFxWkT0RETPxIkTO623mZmNcVUC1JPAUgCSBIyPiOmtFyUtAPwY+ExEPNV3HWBp4InaamxmZvOEKgHqGmBbSePIa0m3SDpW0o7l9e8AP4mI37atczWwfXm8Q3luZmZW2fwDvSEi7pN0LjAFmAHsAxwBLCppQ+BDwFsk7VZWORA4CzhD0o3AI8DeQ1B3MzMbw5T9HkZWT09PTJ06tfL7pYHfMycN+LhmZvM0SbdGRM9A73MmCTMzayQHKDMzayQHKDMzayQHKDMzayQHKDMzayQHKDMzayQHKDMzayQHKDMzayQHKDMzayQHKDMzayQHKDMzayQHKDMzayQHKDMzayQHKDMzayQHKDMzayQHKDMzayQHKDMzayQHKDMzayQHKDMza6RKAUrSoZJuknStpFX6vCZJJ0q6qm3ZppIek3Rdua1Qc73NzGyMm3+gN0haEdgL2AjYAjge2K3tLd8CXuqz2lLAmRHxxXqqaWZm85oqZ1BbAZdFxEzgcmByn9ePBk7ps2xJ4OlB187MzOZZVQLUJEqwiYgAXpU0ofViREybw3oHSLpe0imSFur7oqQDJU2VNHXatDkVYWZm86oqAWo8oLbnKsvmKCLOiIi1gE2AWcBh/bzntIjoiYieiRMndlBlMzObF1QJUE+S15SQJGB8REyvUng547oYWLPrGpqZ2TypSoC6BthW0jjyetQtko6VtOOcVpC0TNvTTYE7BldNMzOb1wzYiy8i7pN0LjAFmAHsAxwBLDqX1faTtBvwCvAQcEANdTUzs3mIshVuZPX09MTUqVMrv18a+D1z0oCPa2Y2T5N0a0T0DPQ+Z5IwM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGqhSgJB0q6SZJ10papc9rknSipKvals0v6WxJUyT9WNKCNdfbzMzGuAEDlKQVgb2AjYFjgOP7vOVbwMw+y3YHno+IjYD7gf0HX1UzM5uXVDmD2gq4LCJmApcDk/u8fjRwSp9lWwMXlccXljLMzMwqqxKgJgFPA0REAK9KmtB6MSKmzW0dYFp5PhtJB0qaKmnqtGn9FWFmZvOyKgFqPKC25yrLqq4jYELfN0TEaRHRExE9EydOrFLXISF1fzMzs6FTJUA9CSwF2SECGB8R06uuAywNPNF1Dc3MbJ5UJUBdA2wraRx5LekWScdK2nEu61wNbF8e71Cem5mZVTZggIqI+4BzgSlkh4hDyGtKi85ltbOAJSXdCKzGaztRmJmZzZWy38PI6unpialTp1Z+/2Cu//T9uHWWZWZmA5N0a0T0DPQ+Z5IwM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGcoAyM7NGmn+kKzCWODO6mVl9fAZlZmaN5ABlZmaN5ABlZmaN5GtQDeXrWWY2r/MZlJmZNVKlACXpUEk3SbpW0ip9Xtu8vHazpO3Lsk0lPSbpunJbYSgqb2ZmY9eATXySVgT2AjYCtgCOB3Zre8vJwHuBGcB1ki4GlgLOjIgv1l1h65ybC81sNKpyBrUVcFlEzAQuBya3XpD0RuDFiHgsIp4GHgTWBpYEnq6/umZmNq+oEqAmUYJNRATwqqQJfV8rppVlAAdIul7SKZIW6luopAMlTZU0ddq0ad1/AjMzG5OqBKjxQHsjkcqyOb02ISLOiIi1gE2AWcBhfQuNiNMioicieiZOnNhV5c3MbOyqEqCeJK8pIUnA+IiY3ve1YmngidaTcsZ1MbBmLbW1ESd1fzMz60SVAHUNsK2kceT1qFskHStpR+ABYGlJy0taGlgV+L2kZdrW3xS4o+6K2+jnYGdmczNgL76IuE/SucAUsqfePsARwKIREZI+AZxf3n5IRMyQtJ+k3YBXgIeAA4am+mbJPRXNxh5FA36dPT09MXXq1Mrvr3Nn5LJcVgN+AmbzFEm3RkTPQO9zqiOzPhzszJrBqY7MzKyRHKDMzKyR3MRnNoTcXGjWPZ9BmZlZI/kMymyUcK9Hm9f4DMrMzBrJZ1BmNig+G7Oh4gBlZo0x2DRWDnhji5v4zMyskXwGZWZjks/GRj8HKDOzAdQZ7JpaVhM5QJmZWSODna9BmZlZIzlAmZlZIzlAmZlZIzlAmZlZIzlAmZlZIzlAmZlZIzlAmZlZI1UKUJIOlXSTpGslrdLntc3LazdL2r4sm1/S2ZKmSPqxpAWHovJmZjZ2DRigJK0I7AVsDBwDHN/nLScDuwDvA74haT5gd+D5iNgIuB/Yv85Km5nZ2FflDGor4LKImAlcDkxuvSDpjcCLEfFYRDwNPAisDWwNXFTedmEpw8zMrLIqqY4mAU8DRERIelXShIiY0f5aMa0sa1/eWjYbSQcCB5anL0q6t7uP8BpL96lTn79bX3kuy2W5rOaU1UV5LmvkynpjlTdVCVDjgVnt9SjLZpR79XltQp/lrWWziYjTgNOqVLITkqZGRE8Ty3NZLstluSyXVV2VJr4ngaUAJAkYHxHT+75WLA080Wd5a5mZmVllVQLUNcC2ksaR15JukXSspB2BB4ClJS0vaWlgVeD3wNXA9mX9HcpzMzOzygZs4ouI+ySdC0whm/X2AY4AFi3XpD4BnF/efkhEzJB0FnCGpBuBR4C9h6T2/au72bDO8lyWy3JZLstlVaRo+oxVZmY2T3ImCTMzayQHKDMzayQHKDMzayQHKDMza6QxFaAkTe7zvKtBZZJWlrRCeTxO0hGS1uywjJ36PJ9f0mZd1mcdScuVx2+XdICkjbssaw1Jb5O0tqS3SlpB0usaUFad31ctZUnat+1x321r377vH6Csd5ahGq3nq0k6RtLHJY3vtG5zqldZ1lWv2Tq2+7ayFpC0ZDfrtpXRuG2izrIkTZC0xFxe320k6lXW7S/7z56SFu2mvG6N+l58kq4BjgP+BPwoIjaQdGpEfFLSzRGxQcVylgHWA9YA7gZ6gO2Az5Kpmk6NiOU7qNetwG3AJ4HPkAcDO0RERxuMpBOALcrTbwMfAa4r9bs5Ir7YYXm3A/9X6jM/meVjsXK7NCK+NEJl1fJ91VmWpGta60i6IiK27O+1imVNBTaIiFclLUt+bycDCwErRsQ+HZTV+rvPlDK+DixMDunYEdi4va4DlFXrdt9W7juBd5Ofc7FS11si4m8dlNG4baLOsiS9Ddic/J7X6ecti1T9P9ZZr1LWbPvOMpzow8C7I+IfnZbXrVF9BiVpfTKt0luYPaPF+q23dFDcV4G/A5sBNwFvAw4GfgJ8Dji8w+rNAG4G3gMsC1zcYX1aNouI9cgkvV8EtomII4Fty61TL0bEvhGxNzmm7TrgPWVj3HkEy6rr+6q7rLq8HBGvlsfHAj+LiOPKAcbKHZb1A+D75E5IwMfJILUPsEmHZdW63Ut6TNKmwFrAXcCXgTcAuwG/7bBuTd0m6iorynobAd8DfgUsSSbY/t4I1ov29ST9G/BB4IbhDE5QLRdfI5Xmkq+T/+Q6TCOnDNmIPCrdBngzubFsAvy4izJnkEeQR5JneN3UdTpARLwi6YGIeKU8D0kvd1FeAEg6kUxJtQB51H0eMHMEy4J6vq86y1pT0i/IH+s65THl+RodlvWKpIPJM6adyKz/LQt1WFbr8zxNbqP3AksAlc9O2tS93T8OfAH4K/BpspXmG/DPI/xONW2bqLOs/yED0jLA8+R28A/gRfL7G6l6AYSkjYCvAVeSB8M3dFlW10ZtgIqIWcDWkh4gT4+ntr28hqQb6Oyf82syKe5kct6rBYFPRcSTkr5Jnt6eXaWg0ha8CPASebR7CbnDXlrSB0v9f1KxXmvVsZMsqahOAJYp7ch3RMSZktYG/o0MKrsOd1mlvNq+r5q/+w+0Pf7vPq/1fT6QjwCHkM2gm0fEtFLfRcl0Yt1oHYG/CVgOeDuwPCO03RfTgX8HTo6IpyS11+XSqoU0dZuoefv6PHkQ8L62ZdHnvpKaP+M+ZA7Vi4BvAn8m5wRcqnV9MyK+30n9ujVqA1SbcWQT39lk1B8H3BMRkyXdUrWQiLgWQNJPyaOZG4ANyTROJ5NnBlVtCKxJNpssQh593F3q2ulFxlp2khHxtKRvkYHjBuA/yvI7Jb1UHv9puMsq6vy+aisrIq4u14uWB+5tS5LcsYh4QtJJpaxH2pa/QOfNxy2tZphxZHP9ODps0ql5u2/VaQlgXLnIv66kPYEfl6bpqhq5TdRc1gvAy+RZ7EltZXyePGh4boTqtSC5PY0nz+oWJP+v89H52f6gjNpOEsqZe39KHjn+EPgj8DPgL+S1kQ0k3RIR7+ygzK2Aj5LXev5ONsU8Aizd6RGDpCnAV8gL0H8CNgDW6/KCZS07ybbyPg/cGBFXlZ5Wq0dEV6fvdZVV8/dVS1mSDiLzTt4NrALsHxHXd1qfOZR1QERc12VZV5JH2DeSO6b2+43IFuBOLq7Xst1LOpvcST5ONvkeQ/5Gp5BNmtt3cg2jidtEnWVJWo3MaTp1Dq9fGRHvHu56lbJuITvK/Ad57fBYYEon+9M6jNpOEuWC81fpPWL8K3AP2fa6gkp32aok/YhsCz4XuJXMwr4T8DvgX7qs5vzkafJeZLt8x8qO7Uay2eRWddm9vJS1dzlF3x9YqTzeFTh1JMtqM+jvq+ayDgDWjojtgC3JHXi3+pY1mM+3Urm1OgatRnZEWKzTgmre7o8hj/p/R85gsCEZLA8Drqe7769p20SdZZ0CHC7pq/3dyMA+EvWC/L9dA2xayrycvBY1rEZtgAKIiFvJdtZHyQvEiojzgS9R2sE7KO5gYAXyn/oLYEWye/d0YJVyxtaJBYE9yKD5A7JDRzfq3EnOonfyyZnl8cvAI+p8zFidZUF931edZb0YES8CRMRjDK55o86yDiOPbk8nD9DOJLf5HwBXdFhWbdt9adZ9H/AQcD951vTr8vJ/kIGvE03cJmorKyK2JofIrAW8nwziv+lzG/Z6FSp1nBURXwDOAjZXl+McuxYRY+ZGns62Hm/WZRnLAmuWx5uSO5Itgfk7LGf/Ps9fD+zSRX2u6fP8yhq+p7512wDYcCTLquv7qvm7nwk8W27PkT2sWo+fHamy+pT7mu0c2K+LcmrZ7tvKWwR4S59lC4z2bWIIytqj3H8A2KTb7WAI6rVSP8u2B143mDp2ehu116BaJK1B9oyaRbafvwQ8FxEvDaLMydHnOoqkvaOz9vj1IuK2tueTyAuWAE9HxIyK5cykt/uwyB/+C+VxRMRgR+tvFxEXD6aMISprv4g4Y5BlvCMibm97vnhEPD/42o08SetHtiAgaVxkr9bWa7tExC+7KHNQ272khYAlI+JRSdtGxKV9Xp8/IroZfoCkd0XETW3P942IszosY3y0XQMrHar2JQeVPzLHFYewLEk3RHbo+ih5PR3y+uKlrf/vSNSrbf11I+KOTtery1gIULVkM9DcR+e/nzy66eTC811kV0/IHlGnk6O8x5FHJ8N6sbHU6Tzg/wF/jYgnyrIbIuI16XKGuawvRsS/l8c/iYgPSrotcoBy19Q2Gl7S/OTA0c0j4vHBlNsEbTu2TwFvjrzOg6S1gO9ERKUBu3Vu98rMCDuQv735S5mrkzvco8jfwfqdBCnlcJF/Ja8t30heLxPwoegss8tSwK8j4l1tyxYmB6AeERGdDNmos6zW//EW4OiyeMFSzrvmvObQ1UvSb4AnyGbkq8nMIi3TgXMi4r2d1K1bo/oaVFFXNoO5jc7ftIt6zSAH4j1Ppo95jJxZ+Cdk885IWIEcM3aZpAvLzqzbkeZ1lrW2pNXL47eU+67KkvRTZfqevmUcC1w2FoJTIWUmlQ8D50hqfdY9gHM6KKfO7b51tLthKes95E7uc2Tw+3LV4CTpOGVGivHk+LGjyE4455VbR2fCEfEMcIWk3duWTY+IM+lwUHmdZfUp96Jy+znZ0aTT9euq1+vITBZHkP+3s8ht4uvAocC1ndatW2NhHFRd2QzqHJ0PsC75D16KHFsi4ALgDnLMw0iIiPgZ8DNJ7yGPhroeaV5jWXeTF4rv6XL9dicBP5e0OTkubj4yOG3A7EeCY8EbgT3JawP/rRxguSeZrqiqOrf7I8m0TW8o5UEenL0dODEiOunA8Ut6M8U8FxF3KZPqrk3+lip3MpG0Unn4U+D0clbWPoascrLemsvaAFhW0qeBieW+5S5Jn46Ibw53vYp9gSUi4ghJu0fEnuXvnE3vmd6QG7UBSjVnM2gz6NH55WziFnLHsT+9R5SHA5+IiG7SJtVhUp8fwer0/kAAqPqDqLmsu4ADyve2rKSjSvlHkd/7vVFxFHxEXCfpfOAg8jufSqbteW+31z+aRtKbyJ1HK6vIKZIeJLe5Y6O7sXJ1ZKX4FZkZYQEyQC1N5vd7HXCipL0i4g+VKpPXX7YuzbSHlMWvkmdOorODz5Po/XwPktkR/vmnyJaOYS1LOWbws2WdJ8nP82QH9RjKei1IHiAcKWmLsnw9ckDx5RHx50HUsyOjNkBF/dkMWgY9Op/cOc4ir4etDryz3C9P54lB6/QPsm25/fPM7GfZcJf1ALkt3kSmBLqJbA69kfz+TyGbRqv6JvAp8oe5eUS8oJxK4oSI6OagpTEkLQCcQW5b7V4hz2KX7bbocj+Y7f4uMsBNLzeR1ykhu7P/B3ldazBmdlqviNil9VjSIRFxUtvzjwI/H+6yIuJZYPdyDerHkg4fzIFrjZ9xNWAieRa8M3nQ8TKwOHlC0MnvcNBGbYACiIibgZslPU+OhWodAXx3MMWW231kIsdbyZ1B5SPJiPiUpMfJRIsfJDeOSeRo/eXmtu4Qe651JlKaF54gexR2s9HVWdZfgZci4jeSXmi7v7SU/5eqBUnarzz8G3kEv1u5PCMyl1hXPdyaIiL+DmwhaYqk/cnBkyuTO/93AhdImhQRnR6ND3q7J1sKbiabmfYFPkHu2II86Hh91YJKh4ud8qH+i0z/Mx/5PxXd77v2YfYm9teRrRwnjnBZ7dsuwPRBBKyu6xURU5Q9o/+TPIv6FbBzRFwh6WqySf+2iLixy7p1ZFQHKPVOzLY/8Fh5PoHs9dNJOzzkyPwgBxhC7+j8u7us3kPkIMWFy/OXgN3JVDQjJcoP/1QyKPxbQ8r6G7lT7P8PdZaqpb+j69ay71PPda4mCOB2smfcF8g0Qk+VVoV9qT5Is87tfiL527uUPKP7UNtrO9BZZoRPk5/h/eT2sTd59vQ78v9ZeRhJ2/UZAQtIegO928T1ZCeASkGlzrLayoFMUTSxn+XVCqm3Xp8lc5u2mnlfUma2gBzM/QmydWPIjeoARW8mA+g99f9nNoOYQ46rOTis3D9O/jDOJIPLQ2TixS06rNvi5I8M8gzqU+VeygzPe0TFsVA1eoZsejkwIu6kVKYBZf2N3h/TA12WAUBEnN56LOkTMchNs2V9AAAW4UlEQVSxVA2miLhN0qnAphFxQVl+ITm8oWqAmtt2vw056WAlEXFc6ax0ErALOXFeN1PCEBH7A5Tm+hPIQHw+2QkE5nJA04/26zP3Mfv1GYA/qM/4oQHKahlsWZCBiYj4VcX3z61eg/6MkhYEtga+FhGzytnroWSvZMhm3AMHWdfKRv04KABJ+7d2TJK2I3ee4yKio1xWkt5MjrMYT+40XyEHxU4DPtzJzk45vuRRsi3/EbJL9sfIHcADXTTBDAlJ74uIXw/8zmEv6+CI+FbphbdL6XrbaRm7RcRP66hP06gMqi0HBUuWLsYoM4ePi4gfdlHmZpH519qXdTVgWtIeZLLZb3W67hzK2zjaEvVK+lhEDKYp3+ZAbQO/y/709ihjHcuyhbvsiNN5XUZzgFKNg0XLuqeS3cLvoTfV/OJkk8eTEbFXhTLWIjNa7Epel5lJdnE+ipxz55DIvHojoi0IzyKPiv4ZhDs9o6uzrLYyn6d0O4+I1ysHom4aEXt0UMa6bU8fbe28x4rSg/UnEbGlMpnwM8BFEfGipF2Br0TEWhXLmo886l4PeCoiHpK0ckQ8WK5x/aHT6w2SziKbCtckj7jnI9NfdTrH0e79XYcpdX5vtwdDGmRWCknLR8RjktaIiG4vAQybchCzd0R0Mq8XyrGJf4mSQ1LSCpFZQj7QzQFjN0Z7E19rsOiXJT1EBqvBTu39s/4u9Eu6ueL67yJ7uLXqcSs5UHFN8kxq3BzWGy6HM4cgLOmJiPjQ3FYeqrIk/Z48u7wjIjaTdIukL5CDPSv1/FKOk3kb2d5+c6nT7WVH+yKZJurRiNi2ar0a6tv09uJrDWf4nKRfk+Pvzu2grH3I1oIXyGwn6wIXlY4v9wGnlWUDKmdNu5M5/VaTNCUiNir33RwJHy7pY+QBy7VkEJ5ODh5dmN5EtJWoNyvF5yTNlpWC3GaqOo886DydPOgcccoJCz8G/IHsOHNZ9Ka/OqHcV51wtXV2eiBwlnIIw0nAO5TjC/+TDno+DsZoD1ARNQ0WVc6lsjKwpaTPkGdBt5P/7AvJEfZVPAqsT+84jUPIC4qtDh2djLkYKoMNwrWWpcwX9gLZ2eJtyvmOViEHjr47cmqVKhYlB+XeRu7AnyF3QE+RO5WdxkBwggxQJ5ZOQSuQA3b/Tl7UXpbOZvu9BDg+Ij4iqXWmeR6wQ+n+3EnGhsXI736Dcma3TJ/7hyPiwg7Kgxx2sA6ZuPZoZe/Yh8ks/5VIOo7MDN6eleJ75AGWyADfCak3RVRTLE2O+buIvG74ZeXs2+uRv61PdlDWR8me0Cq3n5ADuK+NiL8O4lpzx0Z7gKptsGg50jsVuKr8MBciu+3uTM5zc8hcC+j1aHn/i+X2+z6vr69MJHtUxfJqU2MQrrUsspfRwuQZ1D7kDuNmslvyGZI+WuUIPCKelbQLOUX1H4C3kjuz0duO3b+tyLPBCeR3/ouIuBdAmU3iR1SfCXdbYM0S7JaQ9E6yyfYA5ZirTmZjXZwck7g9ebDxarmfVe5PIreNuSpnwlsDEyLicUlPlHq8mxzMuhL52asO1q0tK0WxADmuq9OewkNtRkTcIunP5PdzEPm73LDtbKpT85H/z6XIiWGH1WgPULUNFlWOzj8UmCnpgIj4XjmiFJnf78ByfWugf/TT5I/xYXI6hf4m+Rq2I5B2NQbhWssic74dSTbdrEMewa8YEWtLOpoc4/P/5rx6Us5V81VyBtA7mX3Q6pgJUhFxpKRNIuJ7AJK+L2k62Znn12RTTye+R56xfqI8/yO9vSn3r1JA+f/vR16LfDUizpH0qXJ/cLmvmvtzEWBz4EVJl5Nn0ucB25WDkCPJ4Q37VCksasxKocwP+EpkQuNuWhyG0vslbUMeEJwNrEo2Rf5U0vZdXhc+nxwLdQK93eB9BlVRnYNFz2qNt5F0IPmjPZgMMvdG9ak2XibPnO4nj/b+jTzi+nu5fyQidp/z6kOnxiBca1kR8Q9ld9a1yPyF25M7JCLiaGXPtCpmkDN/fp78vls9j9Ym88utoJItvWJ5TdYecFcnt9UlgdOrdpAo7iQH1h5DBvd2Iq+h3tZ3pddUJuJlZTqcnwCtKVdaO7Jzynt+UKVCEfEc+T9sZejeB/h7ZPYFIuJrHWwTc9NxVgryGhuSFmEQiWGHwMPkmeK1kYNtJwILRsT1ks4Avkz1yU7bv5NdyE5iP6X7LCVdG+0Bqs7Bou3/lNXKDvM7wF+A48lBnlW8TO/U20GeUZwZEbtKunqkglNRVxCutSxJXyabECaTR2u7AL9Uby6++6qUExEzJV1E7qjfRR4kTCUzdr+OnBxwrGjfXl+MzKrSSoVUrYBs5lqC3LafJq9VXFAeH0weKFSe5jsipks6jJyVF+AgSR+giwvqki6jNwirLNuP/G21UphV6gyiGrNSRMTBkm4jp6FozGwQEXEZObPAzeV7OoVslbghIs6V1EmT+8bl/o/kScCt5Xe1qqR1yObzYTHaA1Sdg0Xb/ZlsJjqU/AEvUnXFiJgh6VhmzxjRqtOfJK3dqusIqCsI113WjeRO7a1kSpxXyZxuPyZ3AieXx1X0kL0KP0v2zFqETC+1TUR8voM6Nd0tbTvxVyX9lkzyeaekR6Jtssy5mET2yLqUPEpuDRdoNdP9XVLVo24kHUH2umzNwfZweX5oaQI8vr9u4/2JiG36KX8icFq05Z2rqLasFMVMsln6Kkk7kv+DZ4CbY4QSEpdrwkE2kd9IHij+l3rTfAX5+6xiEUlbR8QpkvaT9OHWnyEPWj49l3VrNarHQfVHXQ4WVVv+Mkl7RMT/lsdvJMfhdHSBUJlqZAZ5XeDNEXG/Mh3JtIjoanT9YEm6NiI2LY9/Rx5J7kkJwtHZhIy1lVXKWAqYHBEXSFqMnCjvovLav5TemlXK+Up5uC/Z6aIVSNeNiJ06qdNoUw7ONiMnxKzUnFZ2+peRzUN7kE2jr9J7xhnk/GovVChryVYzXD+vrUo20z1UsV6tHe5si4EVImKlflapUubV5A72fPIzTyov7RERlXNkSro4IrYrO+4VyAP91cmz9gOiz2Dn4SRpKnlgvSd50PHZiKjUAtFWxgXAqRHx6/J/OLrt5c3IOdv27KB3bdfGXICyOZO0XJQJ+5RjVu4mL4YvRYdBuO6AXje1ZRcpz3cCftXleJwxTdJy5PWKYZtGYSRpiLJSlAPQxWOEpkiXtDzwDuBPEXF3aY7bJSKO7bCcZaM38cGeEXFun9e3iojLa6v43Ori3+u8Q9LXyPblD0fEw5KuJZsrToqI8wZZ9tvIFE4v1lDVQdEQZSAY61qdXUa6Hi0ahowN5ayz4ywLZd11RyoY9af0UtxAg8imU8r5INkbdE3ghaH67qtozEU+G1rlhxhkd9HTy5nOAuQ1gj0kdTRPj6SvSbpGvVmUv0VO9bBznfXu0uGSfivpZEkflNTKKH8yOf2J9W/YkoBW1DpoOn2u7xqApJ0kXSjp65LeqxwY3nICOcC5alm/kXS2Mt3UuZJWarstJemSwdR1kKR6BhAfTA7w/R8yYXCr8G9I+oGkTq4vD8po7yRh1W1Ldtf9ITlG4hSy+/UbgCuAL0i6Iiokgewn2B1ABrutgO9LmhW92bVHyqAzENiIq2uHW2eWhdeRPVaPIA94zqJ3JtwHyJRMI6WuAcQLkNeSrwQ2UQ5+v4TsgLQX2eV8WPgMah4REb8hj0R/RG/PxF3JcTD3kYHroIrFtYLdfcw52C0859WHhqTxyuzLE8q1tsvIXHUvMHsGAiskXVvOhK8le2Ne03a7VtKIXfCn3owNMyLiFnI24ovIbf0dwBFdZFnYF3hLRHybvEyyZ0S0OiX8qIa6dkxtA4gZfDfwN5GBCDK4H0BmCZkVEY+SHWmGhQPUvOVoslfhbeRYkr+Q88a8mRyrsluVQmoOdnXqm4Hgz2R34O0iM9H/lhwzZ722IM8q3k3+797ddmu9Nuxq3uFCZlm4Evhfcq6rVcksGT+VVOmgRTlb94Jkj8dJkrYoy9crvXYvH8GOJnUOIL6H2XOGjlhHBTfxzSMkbUk2ee1IJs78HDku4jBy/EZIOq2DIo8mx3rdRh6J9g12vyLH2AybYcxAMGa0nz1IoouziaFS5w63riwLq5Hpft5ADsD/FTkwf3GyububDDa1qHkAcdCQ1GA+g5p33EAOsDufDFQXkMlYvwL8Rjnos1IHghLs/pUMdl8jA913yOmgzyhdVDsJdrWRdJmkS8ksA9sBO0u6VdIfy/KPjkS9Rol/JjaWtLmkz45URSLiYPIA+mo6T+bat6zLIuJ44CRJawM/I38DlC7UD1csZwqwBpmSaxwZoBaOiCvI66/bSNpwMHUdpJlkNpZXJO0o6f2SJkvq9ETk7WTQnpNhC17uZj4PkXQh8GEyOP07mdxTwFGddCVVTgv9B/Ki8DvJpL1PktOGU8qMaMjUFuo+A8GYJ2l3MmjvQyblbU8vtCOwbWSy1ZGo282UjA3kgVBXGRvaBv2+g8yW8Sx5xg+922qlbtmSvkS2QCxFDvpdG2jN3H03sFVE7Fu1bnWqawCxpOvIFpJtySbNNcigvjfZ8nJyRKxf/yd4LTfxzVu+TGa3OCMiLgEukTSZzCAwtyOm2UTEK5LupoZgVzfNJQPBCFRnNJgE/Av5/Yxn9qauL0dEpbOLIfJMSbd0CrmTbO1wzypjtirtcCNiIxh8loVyYLY18LWImKVM73Uo+ZuCnD14xLrqR5mpu+8g+dYA4k6KIg86dwZ+3ko7VTo+bUeHE0UOhs+grCuS1id/kHtExJll2WQyLU7lYDcE9dqd2QcZ3jVSdRlNJC0BvKscuCBp/sjkuxOiu2kahoy6zNignHKilT6r2ywL41rX6UqP0dtbWRfKsoWrDNVoMknvJWcFWJIMvrtExBkjUhcHKOuWpC0i4qo5vPaaFCnDoXSL/hLwDeAHUSaslPQNynw2EbH3nEuYt5ROAi0PRMS/l4OP48kZZz8bER/uf+3hoRoyNqgtSXMr+JbHPcDzEfHHDsp6MzldfCu57ivkUIZpTQvmgyHpLLJjyJrkwaiAjYYzXZgDlHVN0i1kU8BzwP+R446uIqccX2wk2uIl3UTOa7QJORX6RHKQ4fsogwwHkwZmrCnf15XkDugTZLfyG8n/4yTgmCiz9Q5zvX5Dzuf1WbKjxHZtL08HzomI93ZQ3j/T/6g3JdB2wHFk4tNOmvpOJa9B3UM2iy5ENqG9AXgiIj5UtawmUubp3B1YMyJWkzQlcoLSKa3m0uHia1A2GBERk5XJRnvI+bh+ClwRIzcpYGuQ4cP0DjI8gDLIUNKwDTIcJWaQU1C0Jpv8PDk+6EjgxpEITkXdGRvap4eRpDXJz7pVRDzfRf1+Fv1MjKrmzbLbjcXI4SMbSDoIWKbP/cMRceFwVMTdzK0rpVv6qpL2JXv7fII8sv04mZHgPSNUtcYMMhwlliV7YrbORuYvt/WBBUZ43Ni+1JexIfo8vgfYodPgVDrh7Ax8RdINkq6UdELp0i1y+x/tFieHpTxCTgT7armfVe5PGq6KOEBZt95LHmm/QvaKWjQi3h8RPyebY47rYvxFHRozyHCUeJUcP/MP8nv7Izkebn7gXnKc3LAaoowNkrRFySaxGplV5HxJl5cmu0pKE9d5ZG/VyWTT8Xlk0+itdDC5aRMpJ5bcD1iXnLTyHPLa2jlkr8pzmH1+qCHlAGXd+hY5fuZhMkXO1spMzzuTM5Re1slYlRo1ZpDhKPEUcDvZa2s8maJqRfJ/eB85lGC49c3Y0BoDtTjwAzoIApImSDodWKJ06NkOuL/cbwfsQJ5BVi3vTWTX8p+Xru4v09s8eh1woGbPlj6qlM+zHtmd/uKyuNU8ek55T6XJMOvgAGXdehhYhZxG+3GyV9NiZPbwm8kL7SPhDnp3aHeSGS4eJJurJpc6Wq/FyPRU65JnUteTTTuLks1Vw576qM6MDaVX3TnA6yWdSwa3V8ku1BuWHXIn2UXOiohXysFXa8zTwWRAnxoRH2lQuqiulG7yh5HNfAAHSfoAmcJsWDlAWbduAl4km4OeI3v9LEceiS9KZwMD69Q+yHB6RGwTEd8hf1zDOshwlDgfeIzslHAmcCzwXXLn9E3gUyNUr8+SmfLfDiwPvCTpq2ST4y/Ia56VlMD2UCnvV+R1txnA/5P0mYj4/dzW76O9s8VqZbDud4ATyeS6o56kI8jruC+VRQ8DbyUT695axhoOT13czdy6IelE8gh3X3L6gj3J/HcPkN3Mv9wagT7M9ZptkGFJIGtdkrRpRAzrHEclY8MlZA+7WZLu5LUZG34REZt2UGarq/QbgC9ExCdLU9z5wHcj4vyK5Vzb+ruSfkf2BNyTktU/IrasWqemkrRklATL/by2KpmA+aFhqYsDlHVLOZncn4F/RMQTkiZHxA3ltbWBe0boOpSNcv1kbPhd5BxfretAy0XE9R2U9w6yyXfF9g4WkhYlz7Z/VKVpTtKkiHiyPN4jIv63PH4jsGnfNEM2OA5Q1hVJR5FNQ0eQ3cxbF07nI48m/9KUZLE2dkjah8ykf0RpuquyTms7XJ5MnLoOOaj8FPL623cjYt0O6rBx5FQd4yPiH23L1yc7Y/y2alk2d74GZd16Pdme/yN6Zzz9EtkW/7yDkw2GpBXL/bPKKVRaTU53kdd8XtdBcWcAG5LXSH9GTv/+HJkt4yxyIHcnvlY63NwgaV1Jr5f0erIX5IhMMzNWOZOEdas1UeHLZKeIwGOQrD5nSHoQuDMitlFOSQ+53b1CztpcNZvB08BJEfFXSauRs0o/TvbmG0/2WuyEyO7p95R1T6d3uz+vw7JsLnwGZd1akJwPZ3WyK/LC5Ay76wNLSHrfCNbNRr/30NvNGXoDwHFkb7m/dlDWTWTAu5EcYH4A2cy3FZnS6atVCpE0n6TPkCm+vkB2WX+ZvLb1h3LrJm2SzYHPoKxbbyW7mf8NuIIcBzU/OWjxe+Q4Fnfptq5ERJRxS/uVRa3u3S9ExMmSdu2grI/DP7MkrAlcGBEPlJd/IelLkhaIiCp5Gv8GrF8C1TiyZ+Flba+r37WsKw5Q1pWIOFLS5RGxlaSlgJ3GQhdbawZJRzP7dabWGdQ/JH2bzHZRtaz2MzEBkWnz8jHwf1WCU0S8Sp6JfZQMTBuSZ2L/CSxABqpKZ2NWjQOUdaUki11f0qXkD/3tkq4gsxEEcG9EfHok62ij2sXALcCzJX/eOmX59eSZ+VEdlDUzIjaTdEVEbNm6h2y2I8fNdeKiiPgfSY9GxP8BG0tagBw8fD6ZTcVq4G7m1hVJi5Dz9GxWFm1Cdjc/iry2eUNErDOH1c26IulwMnvJ4VFxyndJfyWvD63Vdn81Od/UA+Q1qqurTMQnaQMyvdexwEoR8YCkj5AHZDfPbZCrdc5nUNaViHhR0rdL3q7WGdWSETG9ZDH/4sjW0MaiiPgvSReSXcarrjNb/sVyLWpxckLG9YHJMYeZoftxHHmm9HmyN+F/As8Ch5XB6SeRqaKsBj6DMjOrqBx8LUfO1nxDuS6FpL3Js7KNWtklbPAcoMzMOqDZp48/KyL2VZlGfqTrNtZ4HJSZWWfau5Kv2c8yq4mvQZmZVVRyAU4sTXoCli6Plyr3AD9sNf3Z4LiJz8ysIkkfH+AtAXzPAaoePoMyM6vuQQbIN+ngVB9fgzIzq25DMpXXhmTuyTPa7jcsN6uJm/jMzDrQmp23PL45IjaQdEtEvHOk6zbW+AzKzKx77r03hBygzMw6s1nb48nl3hMVDgE38ZmZWSP5DMrMzBrJAcrMzBrJAcrMzBrJAcrMzBrp/wP0sjVs4/bf3wAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 绘制特征重要性图像\n",
    "plt.title('Feature Importance')\n",
    "# 特征数量 和 特征重要性\n",
    "plt.bar(range(factor_weight.shape[0]), factor_weight['importance'], color='blue', align='center')\n",
    "# 横轴特征名\n",
    "plt.xticks(range(factor_weight.shape[0]), factor_weight['features'], rotation=90)\n",
    "           # fontdict={'color':'red', 'size':16})\n",
    "plt.xlim([-1, factor_weight.shape[0]])\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. 回归模型预测--市值预测"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.1 支持向量机训练及预测"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 选出最重要的10个特征\n",
    "features = factor_weight['features'][:10]\n",
    "x_new = df[features]\n",
    "y_new = df['市值']\n",
    "x_new = x_new.fillna(0)\n",
    "y_new = y_new.fillna(0)\n",
    "\n",
    "#将数据拆分为训练集和验证集\n",
    "x_new_train,x_new_test,y_new_train,y_new_test=\\\n",
    "train_test_split(x_new,y_new,test_size = 0.2)\n",
    "\n",
    "#对数据特征进行标准化处理\n",
    "from sklearn import preprocessing\n",
    "scaler=preprocessing.StandardScaler()\n",
    "x_new_train=scaler.fit_transform(x_new_train)\n",
    "x_new_test=scaler.transform(x_new_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>平均差</th>\n",
       "      <th>动量线</th>\n",
       "      <th>乖离率</th>\n",
       "      <th>PB</th>\n",
       "      <th>移动平均</th>\n",
       "      <th>指数移动平均</th>\n",
       "      <th>换手率</th>\n",
       "      <th>营收增长率</th>\n",
       "      <th>PE</th>\n",
       "      <th>PS</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>000001.XSHE</th>\n",
       "      <td>-0.3986</td>\n",
       "      <td>0.58</td>\n",
       "      <td>1.003764</td>\n",
       "      <td>0.8363</td>\n",
       "      <td>14.346</td>\n",
       "      <td>14.308895</td>\n",
       "      <td>0.7716</td>\n",
       "      <td>10.57</td>\n",
       "      <td>7.2001</td>\n",
       "      <td>1.6179</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000002.XSHE</th>\n",
       "      <td>-0.3240</td>\n",
       "      <td>1.04</td>\n",
       "      <td>0.981997</td>\n",
       "      <td>0.9085</td>\n",
       "      <td>18.330</td>\n",
       "      <td>18.316412</td>\n",
       "      <td>0.7211</td>\n",
       "      <td>0.65</td>\n",
       "      <td>9.4960</td>\n",
       "      <td>0.4748</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000063.XSHE</th>\n",
       "      <td>1.5648</td>\n",
       "      <td>0.49</td>\n",
       "      <td>-0.375717</td>\n",
       "      <td>2.2699</td>\n",
       "      <td>25.285</td>\n",
       "      <td>25.099482</td>\n",
       "      <td>1.0023</td>\n",
       "      <td>6.43</td>\n",
       "      <td>17.4166</td>\n",
       "      <td>1.0263</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000066.XSHE</th>\n",
       "      <td>0.5460</td>\n",
       "      <td>-0.09</td>\n",
       "      <td>0.841130</td>\n",
       "      <td>2.5271</td>\n",
       "      <td>10.581</td>\n",
       "      <td>10.561782</td>\n",
       "      <td>0.8867</td>\n",
       "      <td>-9.24</td>\n",
       "      <td>53.8264</td>\n",
       "      <td>1.9665</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000069.XSHE</th>\n",
       "      <td>-0.2518</td>\n",
       "      <td>0.48</td>\n",
       "      <td>3.374778</td>\n",
       "      <td>0.6153</td>\n",
       "      <td>5.630</td>\n",
       "      <td>5.657840</td>\n",
       "      <td>1.2700</td>\n",
       "      <td>-12.56</td>\n",
       "      <td>15.5290</td>\n",
       "      <td>0.4702</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                平均差   动量线       乖离率   ...    营收增长率       PE      PS\n",
       "000001.XSHE -0.3986  0.58  1.003764   ...    10.57   7.2001  1.6179\n",
       "000002.XSHE -0.3240  1.04  0.981997   ...     0.65   9.4960  0.4748\n",
       "000063.XSHE  1.5648  0.49 -0.375717   ...     6.43  17.4166  1.0263\n",
       "000066.XSHE  0.5460 -0.09  0.841130   ...    -9.24  53.8264  1.9665\n",
       "000069.XSHE -0.2518  0.48  3.374778   ...   -12.56  15.5290  0.4702\n",
       "\n",
       "[5 rows x 10 columns]"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x_new.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "000001.XSHE    2811.9175\n",
       "000002.XSHE    2151.8584\n",
       "000063.XSHE    1192.6785\n",
       "000066.XSHE     344.1928\n",
       "000069.XSHE     477.3444\n",
       "Name: 市值, dtype: float64"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_new.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.svm import SVR\n",
    "# 训练支持向量机 (**注意：运行2遍SVR，有时候容易卡，可能聚宽的系统资源有限)\n",
    "#svr = SVR(kernel=\"linear\")\n",
    "#model = svr.fit(x_new_train, y_new_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "#查看分类器在训练集和验证集中的准确率\n",
    "#print(model.score(x_new_train, y_new_train),\n",
    "#      model.score(x_new_test, y_new_test))\n",
    "# 测试使用10个特征性能 > 20个  > 大于使用所有\n",
    "# 使用随机森林对特征重要性选择，n_estimators越大，回归模型性能越好\n",
    "# 使用标准化，性能有所提升\n",
    "# ！！！得分为负值，说明回归模型性能不佳\n",
    "# !!!由于性能不佳，将在文末进行分析\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>市值</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>000001.XSHE</th>\n",
       "      <td>1030.523387</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000002.XSHE</th>\n",
       "      <td>1085.771378</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000063.XSHE</th>\n",
       "      <td>1048.381286</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000066.XSHE</th>\n",
       "      <td>952.710663</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000069.XSHE</th>\n",
       "      <td>1049.253220</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                      市值\n",
       "000001.XSHE  1030.523387\n",
       "000002.XSHE  1085.771378\n",
       "000063.XSHE  1048.381286\n",
       "000066.XSHE   952.710663\n",
       "000069.XSHE  1049.253220"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "svr1 = SVR(kernel=\"linear\")\n",
    "mdl = svr1.fit(x_new, y_new)\n",
    "predict = pd.DataFrame(mdl.predict(x_new),\n",
    "                           # 保持和y相同的index，也就是股票的代码\n",
    "                           index=y_new.index,\n",
    "                           # 设置一个列名，这个根据你个人爱好就好\n",
    "                           columns=['市值'])\n",
    "predict.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "000001.XSHE    2811.9175\n",
       "000002.XSHE    2151.8584\n",
       "000063.XSHE    1192.6785\n",
       "000066.XSHE     344.1928\n",
       "000069.XSHE     477.3444\n",
       "Name: 市值, dtype: float64"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_new.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.13952930473263392"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mdl.score(x_new, y_new)  # 注：如果使用默认的rgb,score为负数，说明性能不佳"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.2 DNN训练及预测"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MLPRegressor结果如下：\n",
      "训练集分数： -0.4203640066186047\n",
      "验证集分数： -0.7902290943421288\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/lib/python3.6/site-packages/sklearn/neural_network/multilayer_perceptron.py:564: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (50) reached and the optimization hasn't converged yet.\n",
      "  % self.max_iter, ConvergenceWarning)\n"
     ]
    }
   ],
   "source": [
    "from sklearn.neural_network import MLPRegressor\n",
    "dnn = MLPRegressor(hidden_layer_sizes=(10,), random_state=1, max_iter=50, warm_start=True)\n",
    "dnn_rg = dnn.fit(x_new_train, y_new_train.ravel())\n",
    "y_pred_dnn = dnn_rg.predict(x_new_test)\n",
    "print(\"MLPRegressor结果如下：\")\n",
    "print(\"训练集分数：\",dnn_rg.score(x_new_train, y_new_train))\n",
    "print(\"验证集分数：\",dnn_rg.score(x_new_test, y_new_test))\n",
    "# 可能数据量和特征过少，性能不如随机森林"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 官网使用样例: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor\n",
    "# 官网参数详细说明：https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor\n",
    "#clf = MLPClassifier(hidden_layer_sizes=(15,), random_state=1, max_iter=2, warm_start=True)\n",
    "#>>> for i in range(10):\n",
    "#...     clf.fit(X, y)\n",
    "#...     # additional monitoring / inspection"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4. 股票选择"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.1 按照真实值和预测值之差选择"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "000001.XSHE    1781.394113\n",
       "000002.XSHE    1066.087022\n",
       "000063.XSHE     144.297214\n",
       "000066.XSHE    -608.517863\n",
       "000069.XSHE    -571.908820\n",
       "Name: 市值, dtype: float64"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#使用真实的市值，减去模型预测的市值\n",
    "# ！！！注意：不能使用y_new替换df['市值'],因为y_new是series类型，但predict是dataframe，无法相减\n",
    "# ！！！     或者考虑转换y_new的类型\n",
    "diff = df['市值'] - predict['市值']\n",
    "diff.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.series.Series"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 查看diff数据类型\n",
    "type(diff)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>diff</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>688169.XSHG</th>\n",
       "      <td>-3164.635979</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>300751.XSHE</th>\n",
       "      <td>-1984.493460</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>002821.XSHE</th>\n",
       "      <td>-1258.285787</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>002791.XSHE</th>\n",
       "      <td>-1176.590370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>601799.XSHG</th>\n",
       "      <td>-1117.110455</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    diff\n",
       "688169.XSHG -3164.635979\n",
       "300751.XSHE -1984.493460\n",
       "002821.XSHE -1258.285787\n",
       "002791.XSHE -1176.590370\n",
       "601799.XSHG -1117.110455"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#将两者的差存入一个数据表，index还是用股票的代码\n",
    "diff1 = pd.DataFrame(diff.values, index = diff.index, columns = ['diff'])\n",
    "# #将该数据表中的值，按生序进行排列\n",
    "diff1 = diff1.sort_values(by = 'diff', ascending = True)\n",
    "diff1.head()  #找到市值被低估最多的10只股票"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "688169.XSHG   -3164.635979\n",
       "300751.XSHE   -1984.493460\n",
       "002821.XSHE   -1258.285787\n",
       "002791.XSHE   -1176.590370\n",
       "601799.XSHG   -1117.110455\n",
       "603290.XSHG   -1103.624082\n",
       "000661.XSHE   -1043.119154\n",
       "300866.XSHE    -939.378945\n",
       "603882.XSHG    -925.659392\n",
       "688005.XSHG    -898.146314\n",
       "Name: diff, dtype: float64"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#找到市值被低估最多的10只股票 (即找到预测误差最小的10只股票)\n",
    "diff1['diff'][:10]\n",
    "#list(diff1.index[:10])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "002594.XSHE     8012.797418\n",
       "601988.XSHG     8585.175221\n",
       "601857.XSHG     8908.437026\n",
       "300750.XSHE     9217.897623\n",
       "600036.XSHG     9259.772806\n",
       "601288.XSHG     9556.819003\n",
       "600941.XSHG    11742.188653\n",
       "601939.XSHG    14040.759436\n",
       "600519.XSHG    15102.211273\n",
       "601398.XSHG    15816.063717\n",
       "Name: diff, dtype: float64"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#查看倒数10只误差最大的股票\n",
    "diff1['diff'][-10:]\n",
    "# list(diff1.index[-10:])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.2 按照真实值和预测值之差绝对值最小选择"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "000001.XSHE    1781.394113\n",
       "000002.XSHE    1066.087022\n",
       "000063.XSHE     144.297214\n",
       "000066.XSHE     608.517863\n",
       "000069.XSHE     571.908820\n",
       "Name: 市值, dtype: float64"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#使用真实的市值，减去模型预测的市值\n",
    "# ！！！注意：不能使用y_new替换df['市值'],因为y_new是series类型，但predict是dataframe，无法相减\n",
    "# ！！！     或者考虑转换y_new的类型\n",
    "diff2 = abs(df['市值'] - predict['市值'])\n",
    "diff2.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>diff</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>000877.XSHE</th>\n",
       "      <td>0.040120</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>300896.XSHE</th>\n",
       "      <td>0.064347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>002074.XSHE</th>\n",
       "      <td>0.077142</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>600132.XSHG</th>\n",
       "      <td>0.079201</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>603260.XSHG</th>\n",
       "      <td>0.087088</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 diff\n",
       "000877.XSHE  0.040120\n",
       "300896.XSHE  0.064347\n",
       "002074.XSHE  0.077142\n",
       "600132.XSHG  0.079201\n",
       "603260.XSHG  0.087088"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#将两者的差存入一个数据表，index还是用股票的代码\n",
    "diff3 = pd.DataFrame(diff2.values, index = diff2.index, columns = ['diff'])\n",
    "# #将该数据表中的值，按生序进行排列\n",
    "diff3 = diff3.sort_values(by = 'diff', ascending = True)\n",
    "diff3.head()  #找到市值预测误差最小的10只股票"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 打印选择的股票名称， !!! get_security_info无法返回信息\n",
    "# for code in diff3.index[:10]:\n",
    "#    df_diff3 = get_security_info(code)\n",
    "#df_diff3.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.3 对比两种选择方式的异同"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "same selection:  0\n"
     ]
    }
   ],
   "source": [
    "diff3_lst_10 = list(diff3.index[:10])\n",
    "cnt = 0\n",
    "for item in list(diff1.index[:10]):\n",
    "    if item in diff3_lst_10:\n",
    "        cnt += 1\n",
    "        print('same selected stock: ', item)\n",
    "print('same selection: ', cnt)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5. 模型性能分析"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 由于之前通过分类模型选择特征，之后用回归模型进行市值预测，再测试集回归模型性能不佳，下面和手工选择的特征进行对比"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>code</th>\n",
       "      <th>market_cap</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>000001.XSHE</td>\n",
       "      <td>2811.9175</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>000002.XSHE</td>\n",
       "      <td>2151.8584</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>000063.XSHE</td>\n",
       "      <td>1192.6785</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>000066.XSHE</td>\n",
       "      <td>344.1928</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>000069.XSHE</td>\n",
       "      <td>477.3444</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          code  market_cap\n",
       "0  000001.XSHE   2811.9175\n",
       "1  000002.XSHE   2151.8584\n",
       "2  000063.XSHE   1192.6785\n",
       "3  000066.XSHE    344.1928\n",
       "4  000069.XSHE    477.3444"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stocks = get_index_stocks('000300.XSHG')\n",
    "q = query(valuation.code,valuation.market_cap).filter(\n",
    "     valuation.code.in_(stocks))\n",
    "dataset = get_fundamentals(q)\n",
    "dataset.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>code</th>\n",
       "      <th>market_cap</th>\n",
       "      <th>平均差</th>\n",
       "      <th>换手率</th>\n",
       "      <th>移动平均</th>\n",
       "      <th>乖离率</th>\n",
       "      <th>动量线</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>000001.XSHE</td>\n",
       "      <td>2811.9175</td>\n",
       "      <td>-0.3986</td>\n",
       "      <td>0.771573</td>\n",
       "      <td>14.274</td>\n",
       "      <td>1.435072</td>\n",
       "      <td>0.14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>000002.XSHE</td>\n",
       "      <td>2151.8584</td>\n",
       "      <td>-0.3240</td>\n",
       "      <td>0.721109</td>\n",
       "      <td>18.460</td>\n",
       "      <td>0.162338</td>\n",
       "      <td>0.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>000063.XSHE</td>\n",
       "      <td>1192.6785</td>\n",
       "      <td>1.5648</td>\n",
       "      <td>1.002297</td>\n",
       "      <td>25.138</td>\n",
       "      <td>-0.046293</td>\n",
       "      <td>0.94</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>000066.XSHE</td>\n",
       "      <td>344.1928</td>\n",
       "      <td>0.5460</td>\n",
       "      <td>0.886746</td>\n",
       "      <td>10.482</td>\n",
       "      <td>1.297468</td>\n",
       "      <td>-0.04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>000069.XSHE</td>\n",
       "      <td>477.3444</td>\n",
       "      <td>-0.2518</td>\n",
       "      <td>1.269952</td>\n",
       "      <td>5.708</td>\n",
       "      <td>1.985981</td>\n",
       "      <td>0.36</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          code  market_cap     平均差       换手率    移动平均       乖离率   动量线\n",
       "0  000001.XSHE   2811.9175 -0.3986  0.771573  14.274  1.435072  0.14\n",
       "1  000002.XSHE   2151.8584 -0.3240  0.721109  18.460  0.162338  0.68\n",
       "2  000063.XSHE   1192.6785  1.5648  1.002297  25.138 -0.046293  0.94\n",
       "3  000066.XSHE    344.1928  0.5460  0.886746  10.482  1.297468 -0.04\n",
       "4  000069.XSHE    477.3444 -0.2518  1.269952   5.708  1.985981  0.36"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 手工选取5个技术指标进行分析\n",
    "dataset['平均差'] = list(DMA(dataset.code, yesterday)[0].values())\n",
    "dataset['换手率'] = list(HSL(dataset.code, yesterday)[0].values())\n",
    "dataset['移动平均'] = list(MA(dataset.code, yesterday).values())\n",
    "dataset['乖离率'] = list(BIAS(dataset.code, yesterday)[0].values())\n",
    "dataset['动量线'] = list(MTM(dataset.code,yesterday).values())\n",
    "dataset.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>market_cap</th>\n",
       "      <th>平均差</th>\n",
       "      <th>换手率</th>\n",
       "      <th>移动平均</th>\n",
       "      <th>乖离率</th>\n",
       "      <th>动量线</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>code</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>000001.XSHE</th>\n",
       "      <td>2811.9175</td>\n",
       "      <td>-0.3986</td>\n",
       "      <td>0.771573</td>\n",
       "      <td>14.274</td>\n",
       "      <td>1.435072</td>\n",
       "      <td>0.14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000002.XSHE</th>\n",
       "      <td>2151.8584</td>\n",
       "      <td>-0.3240</td>\n",
       "      <td>0.721109</td>\n",
       "      <td>18.460</td>\n",
       "      <td>0.162338</td>\n",
       "      <td>0.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000063.XSHE</th>\n",
       "      <td>1192.6785</td>\n",
       "      <td>1.5648</td>\n",
       "      <td>1.002297</td>\n",
       "      <td>25.138</td>\n",
       "      <td>-0.046293</td>\n",
       "      <td>0.94</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000066.XSHE</th>\n",
       "      <td>344.1928</td>\n",
       "      <td>0.5460</td>\n",
       "      <td>0.886746</td>\n",
       "      <td>10.482</td>\n",
       "      <td>1.297468</td>\n",
       "      <td>-0.04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>000069.XSHE</th>\n",
       "      <td>477.3444</td>\n",
       "      <td>-0.2518</td>\n",
       "      <td>1.269952</td>\n",
       "      <td>5.708</td>\n",
       "      <td>1.985981</td>\n",
       "      <td>0.36</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             market_cap     平均差       换手率    移动平均       乖离率   动量线\n",
       "code                                                             \n",
       "000001.XSHE   2811.9175 -0.3986  0.771573  14.274  1.435072  0.14\n",
       "000002.XSHE   2151.8584 -0.3240  0.721109  18.460  0.162338  0.68\n",
       "000063.XSHE   1192.6785  1.5648  1.002297  25.138 -0.046293  0.94\n",
       "000066.XSHE    344.1928  0.5460  0.886746  10.482  1.297468 -0.04\n",
       "000069.XSHE    477.3444 -0.2518  1.269952   5.708  1.985981  0.36"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataset.index = dataset.code\n",
    "dataset.drop('code', axis = 1, inplace = True)\n",
    "dataset.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "15       平均差\n",
       "12       动量线\n",
       "18       乖离率\n",
       "9         PB\n",
       "17      移动平均\n",
       "16    指数移动平均\n",
       "7        换手率\n",
       "6      营收增长率\n",
       "8         PE\n",
       "10        PS\n",
       "Name: features, dtype: object"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 之前通过模型选出的10个特征\n",
    "features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8610515400976911"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 1. 随机森林性能\n",
    "from sklearn.ensemble import RandomForestRegressor\n",
    "reg = RandomForestRegressor(random_state=20)\n",
    "X = dataset.drop('market_cap', axis = 1)\n",
    "y = dataset['market_cap']\n",
    "reg.fit(X,y)\n",
    "reg.score(X,y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "svr_rbf: -0.058531149897044\n",
      "svr_lin: 0.02959043347707102\n"
     ]
    }
   ],
   "source": [
    "from sklearn.svm import SVR\n",
    "# 训练支持向量机\n",
    "svr_rbf = SVR(kernel=\"rbf\", C=100, gamma=0.1, epsilon=0.1)\n",
    "svr_lin = SVR(kernel=\"linear\", C=100, gamma=\"auto\")\n",
    "#svr_poly = SVR(kernel=\"poly\", C=100, gamma=\"auto\", degree=3, epsilon=0.1, coef0=1) # 训练耗时高\n",
    "#svr_dict = {'svr_rbf': svr_rbf, 'svr_lin': svr_lin, 'svr_poly': svr_poly}\n",
    "svr_dict = {'svr_rbf': svr_rbf, 'svr_lin': svr_lin}\n",
    "for name, svr in svr_dict.items():\n",
    "   svr.fit(X,y)\n",
    "   print(name + ':', svr.score(X,y))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 训练耗时高\n",
    "#from sklearn.svm import NuSVR\n",
    "#from sklearn.pipeline import make_pipeline\n",
    "#from sklearn.preprocessing import StandardScaler\n",
    "#regr = make_pipeline(StandardScaler(), NuSVR(C=1.0, nu=0.1))\n",
    "#regr.fit(X, y)\n",
    "#regr.score(X,y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 6. 实验结论：\n",
    "1. 特征选择：(a)随机森林比决策树性能好 (b)特征选择过多效果并不如挑选几个重要的特征, 测试使用10个特征性能 > 20个  > 大于使用所有 (c) 使用随机森林对特征重要性选择，n_estimators越大，回归模型性能越好 (d)\n",
    "2. 市值回归预测：(a)随机森林 > 线性核的SVR (b) rbf核和DNN的score为负数"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "MarkDown菜单",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
